1. 线程锁基础

1.1 threading.Lock

01.基本概念
    a.定义
        threading.Lock是Python标准库中最基本的线程同步原语,用于保护共享资源,防止多个线程同时访问造成数据不一致。Lock是互斥锁,同一时刻只能被一个线程持有。
    b.工作原理
        Lock内部维护一个锁定状态标志,当线程调用acquire()时,如果锁未被占用则获取成功并将状态设为锁定,如果已被占用则阻塞等待。调用release()释放锁后,等待的线程之一会被唤醒获取锁。

02.创建与使用
    a.创建锁对象
        a.基本创建
            使用threading.Lock()创建锁对象,无需传入任何参数,返回一个新的锁实例。
        b.代码示例
            ---
            import threading

            # 创建锁对象
            lock = threading.Lock()

            # 共享资源
            counter = 0

            def increment():
                global counter
                # 获取锁
                lock.acquire()
                try:
                    # 临界区代码
                    temp = counter
                    temp += 1
                    counter = temp
                finally:
                    # 释放锁
                    lock.release()

            # 创建多个线程
            threads = []
            for i in range(10):
                t = threading.Thread(target=increment)
                threads.append(t)
                t.start()

            # 等待所有线程完成
            for t in threads:
                t.join()

            print(f"最���计数: {counter}")  # 输出: 最终计数: 10
            ---
    b.锁的状态检查
        a.locked方法
            使用locked()方法检查锁是否被占用,返回True表示已锁定,False表示未锁定。此方法不会阻塞线程。
        b.代码示例
            ---
            import threading

            lock = threading.Lock()

            # 检查初始状态
            print(f"初始状态: {lock.locked()}")  # 输出: False

            # 获取锁
            lock.acquire()
            print(f"获取锁后: {lock.locked()}")  # 输出: True

            # 释放锁
            lock.release()
            print(f"释放锁后: {lock.locked()}")  # 输出: False
            ---

03.线程安全示例
    a.不使用锁的问题
        多个线程同时修改共享变量时,由于线程调度的不确定性,会导致数据竞争和结果错误。
    b.问题演示
        ---
        import threading
        import time

        # 共享资源
        balance = 1000

        def withdraw(amount):
            global balance
            # 模拟读取余额
            temp = balance
            time.sleep(0.001)  # 模拟处理延迟
            # 模拟扣款
            temp -= amount
            balance = temp

        # 创建多个线程同时取款
        threads = []
        for i in range(10):
            t = threading.Thread(target=withdraw, args=(100,))
            threads.append(t)
            t.start()

        for t in threads:
            t.join()

        print(f"最终余额: {balance}")  # 期望: 0, 实际可能: 900或其他错误值
        ---
    c.使用锁解决
        a.功能说明
            通过Lock保护临界区代码,确保同一时刻只有一个线程执行取款操作,避免数据竞争。
        b.代码示例
            ---
            import threading
            import time

            balance = 1000
            lock = threading.Lock()

            def withdraw_safe(amount):
                global balance
                lock.acquire()
                try:
                    temp = balance
                    time.sleep(0.001)
                    temp -= amount
                    balance = temp
                finally:
                    lock.release()

            threads = []
            for i in range(10):
                t = threading.Thread(target=withdraw_safe, args=(100,))
                threads.append(t)
                t.start()

            for t in threads:
                t.join()

            print(f"最终余额: {balance}")  # 输出: 0(正确结果)
            ---

04.性能考虑
    a.锁的开销
        获取和释放锁涉及系统调用和线程上下文切换,会带来性能开销。应尽量减小临界区范围,只保护必要的代码段。
    b.锁粒度优化
        a.粗粒度锁
            使用单个锁保护多个资源,实现简单但并发度低,适合简单场景。
        b.细粒度锁
            为不同资源使用独立的锁,提高并发度但增加复杂性,适合高并发场景。
        c.代码示例
            ---
            import threading

            # 细粒度锁示例
            class BankAccount:
                def __init__(self, balance):
                    self.balance = balance
                    self.lock = threading.Lock()  # 每个账户独立的锁

                def withdraw(self, amount):
                    with self.lock:
                        if self.balance >= amount:
                            self.balance -= amount
                            return True
                        return False

                def deposit(self, amount):
                    with self.lock:
                        self.balance += amount

            # 创建多个账户,每个账户��以并发操作
            account1 = BankAccount(1000)
            account2 = BankAccount(2000)

            # 对不同账户的操作可以并发执行
            t1 = threading.Thread(target=account1.withdraw, args=(100,))
            t2 = threading.Thread(target=account2.withdraw, args=(200,))
            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print(f"账户1余额: {account1.balance}")  # 输出: 900
            print(f"账户2余额: {account2.balance}")  # 输出: 1800
            ---

1.2 锁的获取与释放

01.acquire方法
    a.基本用法
        acquire()方法用于获取锁,如果锁当前未被占用则立即获取成功并返回True,如果已被其他线程占用则阻塞等待直到锁被释放。
    b.参数说明
        a.blocking参数
            blocking=True时阻塞等待锁,blocking=False时非阻塞尝试获取锁,获取失败立即返回False。
        b.timeout参数
            设置等待超时时间(秒),超时后返回False。timeout=-1表示无限等待,等同于blocking=True。
        c.代码示例
            ---
            import threading
            import time

            lock = threading.Lock()

            def worker1():
                print("线程1尝试获取锁...")
                if lock.acquire(blocking=True):
                    print("线程1获取锁成功")
                    time.sleep(2)
                    lock.release()
                    print("线程1释放锁")

            def worker2():
                time.sleep(0.5)
                print("线程2尝试获取锁(非阻塞)...")
                if lock.acquire(blocking=False):
                    print("线程2获取锁成功")
                    lock.release()
                else:
                    print("线程2获取锁失败")

            def worker3():
                time.sleep(0.5)
                print("线程3尝试获取锁(超时1秒)...")
                if lock.acquire(timeout=1):
                    print("线程3获取锁成功")
                    lock.release()
                else:
                    print("线程3获取锁超时")

            t1 = threading.Thread(target=worker1)
            t2 = threading.Thread(target=worker2)
            t3 = threading.Thread(target=worker3)

            t1.start()
            t2.start()
            t3.start()

            t1.join()
            t2.join()
            t3.join()
            # 输出:
            # 线程1尝试获取锁...
            # 线程1获取锁成功
            # 线程2尝试获取锁(非阻塞)...
            # 线程2获取锁失败
            # 线程3尝试获取锁(超时1秒)...
            # 线程3获取锁超时
            # 线程1释放锁
            ---

02.release方法
    a.基本用法
        release()方法用于释放锁,释放后等待该锁的其他线程之一会被唤醒并获取锁。必须由持有锁的线程调用,否则抛出RuntimeError异常。
    b.异常处理
        a.重复释放错误
            对未持有的锁调用release()会抛出RuntimeError,必须确保acquire()和release()成对使用。
        b.代码示例
            ---
            import threading

            lock = threading.Lock()

            # 正确用法
            lock.acquire()
            print("获取锁成功")
            lock.release()
            print("释放锁成功")

            # 错误用法:重复释放
            try:
                lock.release()  # 未持有锁就释放
            except RuntimeError as e:
                print(f"错误: {e}")  # 输出: 错误: release unlocked lock
            ---
    c.异常安全释放
        a.功能说明
            使用try-finally确保即使临界区代码抛出异常,锁也能被正确释放,避免死锁。
        b.代码示例
            ---
            import threading

            lock = threading.Lock()
            shared_list = []

            def safe_append(value):
                lock.acquire()
                try:
                    # 可能抛出异常的代码
                    if value < 0:
                        raise ValueError("值不能为负数")
                    shared_list.append(value)
                    print(f"添加: {value}")
                except ValueError as e:
                    print(f"错误: {e}")
                finally:
                    # 确保锁被释放
                    lock.release()
                    print("锁已释放")

            t1 = threading.Thread(target=safe_append, args=(10,))
            t2 = threading.Thread(target=safe_append, args=(-5,))

            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print(f"最终列表: {shared_list}")
            # 输出:
            # 添加: 10
            # 锁已释放
            # 错误: 值不能为负数
            # 锁已释放
            # 最终列表: [10]
            ---

03.获取释放模式
    a.标准模式
        使用acquire()和release()显式获取和释放锁,需要手动管理锁的生命周期,容易出错。
    b.超时获取模式
        a.功能说明
            设置超时时间避免无限等待,适用于需要快速失败或重试的场景,提高系统响应性。
        b.代码示例
            ---
            import threading
            import time

            lock = threading.Lock()
            data = []

            def try_update(value, retry=3):
                for attempt in range(retry):
                    print(f"尝试 {attempt + 1}/{retry}...")
                    if lock.acquire(timeout=0.5):
                        try:
                            data.append(value)
                            print(f"成功添加: {value}")
                            return True
                        finally:
                            lock.release()
                    else:
                        print(f"获取锁超时,等待重试...")
                        time.sleep(0.1)
                print(f"失败: 无法添加 {value}")
                return False

            def long_operation():
                lock.acquire()
                print("开始长时间操作...")
                time.sleep(2)
                lock.release()
                print("长时间操作完成")

            t1 = threading.Thread(target=long_operation)
            t2 = threading.Thread(target=try_update, args=(100,))

            t1.start()
            time.sleep(0.1)
            t2.start()

            t1.join()
            t2.join()

            print(f"最终数据: {data}")
            ---
    c.非阻塞模式
        a.功能说明
            使用blocking=False立即返回,不等待锁释放,适用于可选操作或需要快速响应的场景。
        b.代码示例
            ---
            import threading
            import time

            lock = threading.Lock()
            stats = {"success": 0, "failed": 0}

            def try_increment():
                if lock.acquire(blocking=False):
                    try:
                        stats["success"] += 1
                        time.sleep(0.01)
                    finally:
                        lock.release()
                else:
                    stats["failed"] += 1
                    print("锁忙,跳过本次操作")

            threads = []
            for i in range(20):
                t = threading.Thread(target=try_increment)
                threads.append(t)
                t.start()

            for t in threads:
                t.join()

            print(f"成功: {stats['success']}, 失败: {stats['failed']}")
            # 输出示例: 成功: 15, 失败: 5
            ---

04.最佳实践
    a.配对使用
        每个acquire()必须对应一个release(),建议使用try-finally或上下文管理器确保锁被释放。
    b.最小临界区
        a.原则说明
            只在必要时持有锁,尽快释放锁以提高并发性能,避免在持有锁时执行耗时操作如IO、网络请求等。
        b.代码示例
            ---
            import threading
            import time

            lock = threading.Lock()
            cache = {}

            def bad_practice(key):
                lock.acquire()
                try:
                    # 错误:在持有锁时执行耗时IO操作
                    time.sleep(1)  # 模拟IO操作
                    cache[key] = f"value_{key}"
                finally:
                    lock.release()

            def good_practice(key):
                # 正确:先完成耗时操作
                time.sleep(1)  # 模拟IO操作
                value = f"value_{key}"

                # 只在更新共享数据时持有锁
                lock.acquire()
                try:
                    cache[key] = value
                finally:
                    lock.release()

            # 对比性能
            start = time.time()
            threads = [threading.Thread(target=good_practice, args=(i,)) for i in range(5)]
            for t in threads:
                t.start()
            for t in threads:
                t.join()
            print(f"优化后耗时: {time.time() - start:.2f}秒")  # 约1秒(并发执行)
            ---

1.3 上下文管理器用法

01.with语句基础
    a.基本语法
        Lock对象支持上下文管理器协议,可以使用with语句自动管理锁的获取和释放,无需手动调用acquire()和release(),代码更简洁且异常安全。
    b.工作原理
        进入with块时自动调用__enter__()获取锁,退出with块时自动调用__exit__()释放锁,即使发生异常也能确保锁被释放。
    c.代码示例
        ---
        import threading

        lock = threading.Lock()
        counter = 0

        def increment():
            global counter
            # 使用with语句自动管理锁
            with lock:
                temp = counter
                temp += 1
                counter = temp

        threads = []
        for i in range(10):
            t = threading.Thread(target=increment)
            threads.append(t)
            t.start()

        for t in threads:
            t.join()

        print(f"最终计数: {counter}")  # 输出: 最终计数: 10
        ---

02.异常安全性
    a.自动释放机制
        with语句确保无论临界区代码是否抛出异常,锁都会被正确释放,避免死锁和资源泄漏。
    b.对比示例
        a.手动管理的风险
            使用acquire()和release()时,如果忘记在finally中释放锁或代码逻辑复杂,容易导致锁未释放。
        b.代码示例
            ---
            import threading

            lock = threading.Lock()
            data = []

            # 不推荐:手动管理容易出错
            def manual_way(value):
                lock.acquire()
                try:
                    if value < 0:
                        raise ValueError("负数")
                    data.append(value)
                finally:
                    lock.release()

            # 推荐:with语句自动管理
            def context_way(value):
                with lock:
                    if value < 0:
                        raise ValueError("负数")
                    data.append(value)

            # 测试异常情况
            try:
                context_way(10)
                print(f"添加成功: {data}")
                context_way(-5)
            except ValueError as e:
                print(f"捕获异常: {e}")

            # 锁已被正确释放,可以继续使用
            context_way(20)
            print(f"最终数据: {data}")  # 输出: 最终数据: [10, 20]
            ---

03.嵌套使用
    a.多个锁的管理
        可以嵌套使用多个with语句管理多个锁,但需注意锁的获取顺���以避免死锁。
    b.嵌套语法
        a.单行嵌套
            Python支持在一个with语句中使用多个上下文管理器,用逗号分隔,按从左到右的顺序获取锁。
        b.代码示例
            ---
            import threading

            lock1 = threading.Lock()
            lock2 = threading.Lock()
            account1_balance = 1000
            account2_balance = 2000

            def transfer(from_lock, to_lock, amount):
                global account1_balance, account2_balance
                # 单行嵌套:同时获取两个锁
                with from_lock, to_lock:
                    if from_lock is lock1:
                        account1_balance -= amount
                        account2_balance += amount
                        print(f"从账户1转账{amount}到账户2")
                    else:
                        account2_balance -= amount
                        account1_balance += amount
                        print(f"从账户2转账{amount}到账户1")

            t1 = threading.Thread(target=transfer, args=(lock1, lock2, 100))
            t2 = threading.Thread(target=transfer, args=(lock2, lock1, 50))

            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print(f"账户1余额: {account1_balance}")  # 输出: 950
            print(f"账户2余额: {account2_balance}")  # 输出: 2050
            ---
    c.多行嵌套
        a.功能说明
            使用多个with语句嵌套,每个with管理一个锁,代码结构更清晰,适合复杂的锁管理场景。
        b.代码示例
            ---
            import threading
            import time

            lock_a = threading.Lock()
            lock_b = threading.Lock()
            resource_a = 0
            resource_b = 0

            def update_both():
                global resource_a, resource_b
                # 多行嵌套
                with lock_a:
                    print("获取锁A")
                    resource_a += 1
                    time.sleep(0.01)
                    with lock_b:
                        print("获取锁B")
                        resource_b += 1
                        print(f"资源A: {resource_a}, 资源B: {resource_b}")

            threads = []
            for i in range(5):
                t = threading.Thread(target=update_both)
                threads.append(t)
                t.start()

            for t in threads:
                t.join()

            print(f"最终 - 资源A: {resource_a}, 资源B: {resource_b}")
            ---

04.实际应用场景
    a.线程安全的类设计
        在类方法中使用with语句保护实例变量,确保多线程环境下的数据一致性。
    b.线程安全计数器
        ---
        import threading

        class ThreadSafeCounter:
            def __init__(self):
                self._value = 0
                self._lock = threading.Lock()

            def increment(self):
                with self._lock:
                    self._value += 1

            def decrement(self):
                with self._lock:
                    self._value -= 1

            def get_value(self):
                with self._lock:
                    return self._value

        counter = ThreadSafeCounter()

        def worker(op, count):
            for _ in range(count):
                if op == "inc":
                    counter.increment()
                else:
                    counter.decrement()

        threads = []
        for i in range(5):
            t1 = threading.Thread(target=worker, args=("inc", 100))
            t2 = threading.Thread(target=worker, args=("dec", 100))
            threads.extend([t1, t2])
            t1.start()
            t2.start()

        for t in threads:
            t.join()

        print(f"最终值: {counter.get_value()}")  # 输出: 最终值: 0
        ---
    c.线程安全的缓存
        a.功能说明
            实现一个线程安全的LRU缓存,使用with语句保护缓存的读写操作,支持多线程并发访问。
        b.代码示例
            ---
            import threading
            from collections import OrderedDict

            class ThreadSafeCache:
                def __init__(self, capacity=100):
                    self._cache = OrderedDict()
                    self._capacity = capacity
                    self._lock = threading.Lock()
                    self._hits = 0
                    self._misses = 0

                def get(self, key):
                    with self._lock:
                        if key in self._cache:
                            self._hits += 1
                            # 移到末尾表示最近使用
                            self._cache.move_to_end(key)
                            return self._cache[key]
                        else:
                            self._misses += 1
                            return None

                def put(self, key, value):
                    with self._lock:
                        if key in self._cache:
                            self._cache.move_to_end(key)
                        self._cache[key] = value
                        # 超过容量时删除最旧的项
                        if len(self._cache) > self._capacity:
                            self._cache.popitem(last=False)

                def stats(self):
                    with self._lock:
                        total = self._hits + self._misses
                        hit_rate = self._hits / total if total > 0 else 0
                        return {
                            "hits": self._hits,
                            "misses": self._misses,
                            "hit_rate": f"{hit_rate:.2%}"
                        }

            cache = ThreadSafeCache(capacity=10)

            def worker(thread_id):
                for i in range(20):
                    key = f"key_{i % 15}"
                    value = cache.get(key)
                    if value is None:
                        cache.put(key, f"value_{i}")

            threads = []
            for i in range(5):
                t = threading.Thread(target=worker, args=(i,))
                threads.append(t)
                t.start()

            for t in threads:
                t.join()

            print(f"缓存统计: {cache.stats()}")
            # 输出示例: 缓存统计: {'hits': 45, 'misses': 55, 'hit_rate': '45.00%'}
            ---

1.4 死锁问题

01.死锁概念
    a.定义
        死锁是指两个或多个线程互相等待对方持有的锁,导致所有线程都无法继续执行的状态。死锁是多线程编程中最严重的问题之一,会导致程序完全卡死。
    b.产生条件
        a.互斥条件
            资源不能被多个线程同时访问,必须独占使用。
        b.持有并等待
            线程持有至少一个锁,同时等待获取其他线程持有的锁。
        c.不可剥夺
            已获取的锁不能被强制释放,只能由持有线程主动释放。
        d.循环等待
            存在线程等待链,形成环路,如线程A等待线程B的锁,线程B等待线程A的锁。

02.死锁示例
    a.经典死锁场景
        两个线程以不同顺序获取两个锁,导致互相等待形成死锁。
    b.代码演示
        ---
        import threading
        import time

        lock1 = threading.Lock()
        lock2 = threading.Lock()

        def thread1_work():
            print("线程1: 尝试获取lock1")
            with lock1:
                print("线程1: 获取lock1成功")
                time.sleep(0.1)  # 模拟处理
                print("线程1: 尝试获取lock2")
                with lock2:
                    print("线程1: 获取lock2成功")

        def thread2_work():
            print("线程2: 尝试获取lock2")
            with lock2:
                print("线程2: 获取lock2成功")
                time.sleep(0.1)  # 模拟处理
                print("线程2: 尝试获取lock1")
                with lock1:
                    print("线程2: 获取lock1成功")

        t1 = threading.Thread(target=thread1_work)
        t2 = threading.Thread(target=thread2_work)

        t1.start()
        t2.start()

        # 程序会卡死在这里
        t1.join(timeout=2)
        t2.join(timeout=2)

        if t1.is_alive() or t2.is_alive():
            print("检测到死锁!线程未能完成")
        # 输出:
        # 线程1: 尝试获取lock1
        # 线程1: 获取lock1成功
        # 线程2: 尝试获取lock2
        # 线程2: 获取lock2成功
        # 线程1: 尝试获取lock2
        # 线程2: 尝试获取lock1
        # 检测到死锁!线程未能完成
        ---

03.死锁预防
    a.锁顺序法
        a.原理说明
            所有线程按照相同的顺序获取锁,破坏循环等待条件,是最常用的死锁预防方法。
        b.代码示例
            ---
            import threading
            import time

            lock1 = threading.Lock()
            lock2 = threading.Lock()

            def safe_transfer(from_lock, to_lock, amount):
                # 确保锁的获取顺序一致
                locks = sorted([from_lock, to_lock], key=id)
                with locks[0]:
                    print(f"获取第一个锁: {id(locks[0])}")
                    time.sleep(0.01)
                    with locks[1]:
                        print(f"获取第二个锁: {id(locks[1])}")
                        print(f"转账{amount}完成")

            def thread1_work():
                safe_transfer(lock1, lock2, 100)

            def thread2_work():
                safe_transfer(lock2, lock1, 50)

            t1 = threading.Thread(target=thread1_work)
            t2 = threading.Thread(target=thread2_work)

            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print("所有转账完成,无死锁")
            ---
    b.超时机制
        a.功能说明
            使用acquire()的timeout参数设置等待超时,超时后放弃获取锁并释放已持有的锁,避免无限等待。
        b.代码示例
            ---
            import threading
            import time

            lock1 = threading.Lock()
            lock2 = threading.Lock()

            def safe_operation(name, first_lock, second_lock):
                retry_count = 0
                max_retries = 3

                while retry_count < max_retries:
                    if first_lock.acquire(timeout=0.5):
                        try:
                            print(f"{name}: 获取第一个锁")
                            time.sleep(0.1)
                            if second_lock.acquire(timeout=0.5):
                                try:
                                    print(f"{name}: 获取第二个锁,执行操作")
                                    return True
                                finally:
                                    second_lock.release()
                            else:
                                print(f"{name}: 获取第二个锁超时,释放第一个锁")
                        finally:
                            first_lock.release()
                    else:
                        print(f"{name}: 获取第一个锁超时")

                    retry_count += 1
                    time.sleep(0.1)  # 退避等待

                print(f"{name}: 达到最大重试次数")
                return False

            t1 = threading.Thread(target=safe_operation, args=("线程1", lock1, lock2))
            t2 = threading.Thread(target=safe_operation, args=("线程2", lock2, lock1))

            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print("操作完成")
            ---
    c.尝试锁机制
        a.功能说明
            使用非阻塞的acquire(blocking=False)尝试获取锁,失败时立即释放已持有的锁并重试,避免死锁。
        b.代码示例
            ---
            import threading
            import time
            import random

            lock1 = threading.Lock()
            lock2 = threading.Lock()

            def try_lock_operation(name, first_lock, second_lock):
                attempts = 0
                while attempts < 10:
                    if first_lock.acquire(blocking=False):
                        try:
                            print(f"{name}: 获取第一个锁")
                            time.sleep(0.01)
                            if second_lock.acquire(blocking=False):
                                try:
                                    print(f"{name}: 获取两个锁,执行操作")
                                    time.sleep(0.05)
                                    return True
                                finally:
                                    second_lock.release()
                            else:
                                print(f"{name}: 无法获取第二个锁,释放第一个锁")
                        finally:
                            first_lock.release()

                    attempts += 1
                    # 随机退避,减少竞争
                    time.sleep(random.uniform(0.001, 0.01))

                print(f"{name}: 尝试{attempts}次后失败")
                return False

            t1 = threading.Thread(target=try_lock_operation, args=("线程1", lock1, lock2))
            t2 = threading.Thread(target=try_lock_operation, args=("线程2", lock2, lock1))

            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print("所有操作完成")
            ---

04.死锁检测与恢复
    a.死锁检测工具
        使用线程转储、日志记录和监控工具检测死锁,Python的threading模块提供了基本的线程状态查询功能。
    b.简单检测实现
        ---
        import threading
        import time
        from datetime import datetime

        class DeadlockDetector:
            def __init__(self):
                self._lock_holders = {}
                self._lock_waiters = {}
                self._monitor_lock = threading.Lock()

            def acquire_with_tracking(self, lock, thread_name, timeout=5):
                start_time = time.time()
                with self._monitor_lock:
                    self._lock_waiters[thread_name] = (lock, datetime.now())

                acquired = lock.acquire(timeout=timeout)

                with self._monitor_lock:
                    if acquired:
                        self._lock_holders[lock] = thread_name
                        if thread_name in self._lock_waiters:
                            del self._lock_waiters[thread_name]
                    else:
                        print(f"警告: {thread_name} 获取锁超时")
                        self.print_status()

                return acquired

            def release_with_tracking(self, lock, thread_name):
                lock.release()
                with self._monitor_lock:
                    if lock in self._lock_holders:
                        del self._lock_holders[lock]

            def print_status(self):
                print("\n=== 锁状态 ===")
                print("持有锁的线程:")
                for lock, holder in self._lock_holders.items():
                    print(f"  锁{id(lock)}: {holder}")
                print("等待锁的线程:")
                for thread, (lock, wait_time) in self._lock_waiters.items():
                    print(f"  {thread} 等待锁{id(lock)} (从{wait_time})")
                print("=" * 20)

        detector = DeadlockDetector()
        lock1 = threading.Lock()
        lock2 = threading.Lock()

        def monitored_work(name, first_lock, second_lock):
            if detector.acquire_with_tracking(first_lock, name, timeout=2):
                try:
                    time.sleep(0.1)
                    if detector.acquire_with_tracking(second_lock, name, timeout=2):
                        try:
                            print(f"{name}: 完成操作")
                        finally:
                            detector.release_with_tracking(second_lock, name)
                finally:
                    detector.release_with_tracking(first_lock, name)

        t1 = threading.Thread(target=monitored_work, args=("线程1", lock1, lock2))
        t2 = threading.Thread(target=monitored_work, args=("线程2", lock2, lock1))

        t1.start()
        t2.start()
        t1.join()
        t2.join()
        ---
    c.最佳实践
        a.设计原则
            避免嵌套锁、减少锁的持有时间、使用更高级的同步原语如RLock、Condition等,从设计层面预防死锁。
        b.代码规范
            建立锁的获取顺序规范,使用上下文管理器确保锁释放,添加超时机制和日志记录,便于问题排查。
        c.实践建议
            ---
            import threading
            from contextlib import contextmanager

            class LockManager:
                """锁管理器,提供统一的锁获取接口"""
                def __init__(self):
                    self._locks = {}

                def get_lock(self, name):
                    if name not in self._locks:
                        self._locks[name] = threading.Lock()
                    return self._locks[name]

                @contextmanager
                def acquire_multiple(self, *lock_names, timeout=5):
                    """按名称顺序获取多个锁,自动排序避免死锁"""
                    sorted_names = sorted(lock_names)
                    locks = [self.get_lock(name) for name in sorted_names]
                    acquired = []

                    try:
                        for lock in locks:
                            if not lock.acquire(timeout=timeout):
                                raise TimeoutError(f"获取锁超时: {lock_names}")
                            acquired.append(lock)
                        yield
                    finally:
                        for lock in reversed(acquired):
                            lock.release()

            # 使用示例
            manager = LockManager()

            def safe_transfer(from_account, to_account, amount):
                # 自动按顺序获取锁,避免死锁
                with manager.acquire_multiple(from_account, to_account):
                    print(f"转账: {from_account} -> {to_account}: {amount}")

            t1 = threading.Thread(target=safe_transfer, args=("A", "B", 100))
            t2 = threading.Thread(target=safe_transfer, args=("B", "A", 50))

            t1.start()
            t2.start()
            t1.join()
            t2.join()

            print("转账完成,无死锁")
            ---

2. 可重入锁

2.1 threading.RLock

01.基本概念
    a.定义
        RLock是可重入锁(Reentrant Lock),允许同一个线程多次获取同一把锁而不会造成死锁。与普通Lock不同,RLock内部维护一个计数器和持有线程标识,同一线程每次acquire()计数器加1,每次release()计数器减1,计数器为0时锁才真正释放。
    b.与Lock的区别
        Lock不支持重入,同一线程重复获取会导致死锁。RLock支持重入,同一线程可以多次获取,适用于递归调用和复杂的嵌套锁场景。

02.创建与使用
    a.创建RLock对象
        a.基本创建
            使用threading.RLock()创建可重入锁对象,无需传入参数,返回一个新的RLock实例。
        b.代码示例
            ---
            import threading

            rlock = threading.RLock()

            def recursive_function(n):
                if n <= 0:
                    return
                rlock.acquire()
                try:
                    print(f"递归层级: {n}, 线程: {threading.current_thread().name}")
                    recursive_function(n - 1)
                finally:
                    rlock.release()

            # 同一线程多次获取锁
            recursive_function(3)
            print("递归完成")
            # 输出:
            # 递归层级: 3, 线程: MainThread
            # 递归层级: 2, 线程: MainThread
            # 递归层级: 1, 线程: MainThread
            # 递归完成
            ---
    b.Lock的重入问题
        a.问题演示
            普通Lock不支持重入,同一线程重复获取会导致死锁,程序永久阻塞。
        b.代码示例
            ---
            import threading

            lock = threading.Lock()

            def bad_recursive(n):
                if n <= 0:
                    return
                print(f"尝试获取锁: {n}")
                lock.acquire()
                try:
                    print(f"获取锁成功: {n}")
                    bad_recursive(n - 1)  # 死锁发生在这里
                finally:
                    lock.release()

            # 使用Lock会死锁
            # bad_recursive(2)  # 取消注释会导致死锁

            # 使用RLock正常工作
            rlock = threading.RLock()

            def good_recursive(n):
                if n <= 0:
                    return
                rlock.acquire()
                try:
                    print(f"RLock层级: {n}")
                    good_recursive(n - 1)
                finally:
                    rlock.release()

            good_recursive(3)
            print("RLock递归完成")
            ---

03.重入机制
    a.计数器原理
        RLock内部维护一个递归计数器和持有线程ID,每次acquire()检查当前线程是否为持有者,是则计���器加1,否则阻塞等待。每次release()计数器减1,减到0时释放锁。
    b.计数器演示
        ---
        import threading

        class RLockDemo:
            def __init__(self):
                self.rlock = threading.RLock()
                self.counter = 0

            def level1(self):
                with self.rlock:
                    print(f"Level1: 获取锁, 计数器应为1")
                    self.counter += 1
                    self.level2()

            def level2(self):
                with self.rlock:
                    print(f"Level2: 再次获取锁, 计数器应为2")
                    self.counter += 1
                    self.level3()

            def level3(self):
                with self.rlock:
                    print(f"Level3: 第三次获取锁, 计数器应为3")
                    self.counter += 1
                    print(f"最终计数: {self.counter}")

        demo = RLockDemo()
        demo.level1()
        print(f"所有锁已释放, 总计数: {demo.counter}")
        # 输出:
        # Level1: 获取锁, 计数器应为1
        # Level2: 再次获取锁, 计数器应为2
        # Level3: 第三次获取锁, 计数器应为3
        # 最终计数: 3
        # 所有锁已释放, 总计数: 3
        ---
    c.线程隔离性
        a.功能说明
            RLock的重入特性仅对持有锁的线程有效,其他线程仍需等待锁释放,保证线程安全。
        b.代码示例
            ---
            import threading
            import time

            rlock = threading.RLock()

            def thread1_work():
                print("线程1: 第一次获取锁")
                rlock.acquire()
                try:
                    print("线程1: 第一次获取成功")
                    time.sleep(0.1)
                    print("线程1: 第二次获取锁(重入)")
                    rlock.acquire()
                    try:
                        print("线程1: 第二次获取成功")
                        time.sleep(0.5)
                    finally:
                        rlock.release()
                        print("线程1: 释放第二次锁")
                finally:
                    rlock.release()
                    print("线程1: 释放第一次锁")

            def thread2_work():
                time.sleep(0.2)
                print("线程2: 尝试获取锁")
                with rlock:
                    print("线程2: 获取锁成功(等待线程1完全释放)")

            t1 = threading.Thread(target=thread1_work)
            t2 = threading.Thread(target=thread2_work)

            t1.start()
            t2.start()
            t1.join()
            t2.join()
            ---

04.实际应用场景
    a.递归算法保护
        在递归函数中保护共享资源,RLock允许递归调用中多次获取锁。
    b.递归树遍历
        ---
        import threading

        class TreeNode:
            def __init__(self, value):
                self.value = value
                self.left = None
                self.right = None

        class ThreadSafeTree:
            def __init__(self):
                self.root = None
                self.rlock = threading.RLock()
                self.sum = 0

            def insert(self, value):
                with self.rlock:
                    if self.root is None:
                        self.root = TreeNode(value)
                    else:
                        self._insert_recursive(self.root, value)

            def _insert_recursive(self, node, value):
                with self.rlock:  # 递归中重入
                    if value < node.value:
                        if node.left is None:
                            node.left = TreeNode(value)
                        else:
                            self._insert_recursive(node.left, value)
                    else:
                        if node.right is None:
                            node.right = TreeNode(value)
                        else:
                            self._insert_recursive(node.right, value)

            def calculate_sum(self):
                with self.rlock:
                    self.sum = 0
                    if self.root:
                        self._sum_recursive(self.root)
                    return self.sum

            def _sum_recursive(self, node):
                with self.rlock:  # 递归中重入
                    if node:
                        self.sum += node.value
                        self._sum_recursive(node.left)
                        self._sum_recursive(node.right)

        tree = ThreadSafeTree()

        def worker(values):
            for v in values:
                tree.insert(v)

        t1 = threading.Thread(target=worker, args=([5, 3, 7],))
        t2 = threading.Thread(target=worker, args=([2, 4, 6, 8],))

        t1.start()
        t2.start()
        t1.join()
        t2.join()

        print(f"树的节点总和: {tree.calculate_sum()}")  # 输出: 35
        ---
    c.嵌套方法调用
        a.功能说明
            类的多个方法需要互相调用且都需要加锁时,使用RLock避免死锁。
        b.代码示例
            ---
            import threading

            class BankAccount:
                def __init__(self, balance):
                    self.balance = balance
                    self.rlock = threading.RLock()

                def deposit(self, amount):
                    with self.rlock:
                        self.balance += amount
                        print(f"存款{amount}, 余额: {self.balance}")
                        self._log_transaction("deposit", amount)

                def withdraw(self, amount):
                    with self.rlock:
                        if self.balance >= amount:
                            self.balance -= amount
                            print(f"取款{amount}, 余额: {self.balance}")
                            self._log_transaction("withdraw", amount)
                            return True
                        return False

                def _log_transaction(self, type, amount):
                    with self.rlock:  # 重入:已在deposit/withdraw中持有锁
                        print(f"日志: {type} {amount}, 当前余额: {self.balance}")

                def transfer_to(self, other, amount):
                    with self.rlock:
                        if self.withdraw(amount):  # 重入
                            other.deposit(amount)  # 可能需要other的锁
                            return True
                        return False

            account1 = BankAccount(1000)
            account2 = BankAccount(500)

            def do_transfer():
                account1.transfer_to(account2, 200)

            t = threading.Thread(target=do_transfer)
            t.start()
            t.join()

            print(f"账户1余额: {account1.balance}")  # 输出: 800
            print(f"账户2余额: {account2.balance}")  # 输出: 700
            ---

2.2 递归锁的使用场景

01.递归函数保护
    a.递归数据结构
        在处理树、图等递归数据结构时,需要在递归调用中保护共享状态,RLock允许递归方法安全地多次获取锁。
    b.递归目录遍历
        ---
        import threading
        import os

        class ThreadSafeFileScanner:
            def __init__(self):
                self.rlock = threading.RLock()
                self.file_count = 0
                self.dir_count = 0
                self.total_size = 0

            def scan_directory(self, path):
                with self.rlock:
                    try:
                        for entry in os.scandir(path):
                            if entry.is_file():
                                self.file_count += 1
                                self.total_size += entry.stat().st_size
                            elif entry.is_dir():
                                self.dir_count += 1
                                self.scan_directory(entry.path)  # 递归调用
                    except PermissionError:
                        pass

            def get_stats(self):
                with self.rlock:
                    return {
                        "files": self.file_count,
                        "dirs": self.dir_count,
                        "size": self.total_size
                    }

        scanner = ThreadSafeFileScanner()

        def worker(path):
            scanner.scan_directory(path)

        # 多线程扫描不同目录
        threads = []
        paths = ["/tmp", "/var/log"]  # 示例路径
        for path in paths:
            if os.path.exists(path):
                t = threading.Thread(target=worker, args=(path,))
                threads.append(t)
                t.start()

        for t in threads:
            t.join()

        print(f"扫描统计: {scanner.get_stats()}")
        ---

02.嵌套方法调用
    a.类方法互相调用
        当类的多个公共方法需要互相调用且都需要线程安全时,使用RLock避免死锁。
    b.购物车系统
        ---
        import threading

        class ShoppingCart:
            def __init__(self):
                self.rlock = threading.RLock()
                self.items = {}
                self.total = 0

            def add_item(self, item_id, price, quantity=1):
                with self.rlock:
                    if item_id in self.items:
                        self.items[item_id]["quantity"] += quantity
                    else:
                        self.items[item_id] = {"price": price, "quantity": quantity}
                    self._recalculate_total()  # 调用其他方法
                    print(f"添加商品{item_id}, 数量{quantity}")

            def remove_item(self, item_id, quantity=1):
                with self.rlock:
                    if item_id in self.items:
                        self.items[item_id]["quantity"] -= quantity
                        if self.items[item_id]["quantity"] <= 0:
                            del self.items[item_id]
                        self._recalculate_total()  # 调用其他方法
                        return True
                    return False

            def _recalculate_total(self):
                with self.rlock:  # 重入:已在add_item/remove_item中持有锁
                    self.total = sum(
                        item["price"] * item["quantity"]
                        for item in self.items.values()
                    )

            def apply_discount(self, discount_rate):
                with self.rlock:
                    for item_id in list(self.items.keys()):
                        item = self.items[item_id]
                        item["price"] *= (1 - discount_rate)
                    self._recalculate_total()  # 重入调用

            def get_total(self):
                with self.rlock:
                    return self.total

        cart = ShoppingCart()

        def worker1():
            cart.add_item("A001", 100, 2)
            cart.add_item("A002", 50, 3)

        def worker2():
            cart.add_item("A003", 200, 1)
            cart.apply_discount(0.1)

        t1 = threading.Thread(target=worker1)
        t2 = threading.Thread(target=worker2)

        t1.start()
        t2.start()
        t1.join()
        t2.join()

        print(f"购物车总价: {cart.get_total():.2f}")
        ---

03.回调函数场景
    a.事件处理系统
        在事件处理系统中,回调函数可能触发其他回调,RLock确保回调链中的线程安全。
    b.观察者模式实现
        ---
        import threading

        class Observable:
            def __init__(self):
                self.rlock = threading.RLock()
                self.observers = []
                self.state = None

            def attach(self, observer):
                with self.rlock:
                    if observer not in self.observers:
                        self.observers.append(observer)

            def detach(self, observer):
                with self.rlock:
                    if observer in self.observers:
                        self.observers.remove(observer)

            def notify(self):
                with self.rlock:
                    for observer in self.observers[:]:
                        observer.update(self)  # 可能触发其他操作

            def set_state(self, state):
                with self.rlock:
                    self.state = state
                    self.notify()  # 重入:调用notify

        class Observer:
            def __init__(self, name, observable):
                self.name = name
                self.observable = observable
                self.rlock = threading.RLock()

            def update(self, observable):
                with self.rlock:
                    print(f"{self.name} 收到更新: {observable.state}")
                    if observable.state == "critical":
                        self.handle_critical()

            def handle_critical(self):
                with self.rlock:  # 重入
                    print(f"{self.name} 处理紧急状态")
                    # 可能触发其他操作

        subject = Observable()
        obs1 = Observer("观察者1", subject)
        obs2 = Observer("观察者2", subject)

        subject.attach(obs1)
        subject.attach(obs2)

        def worker():
            subject.set_state("normal")
            subject.set_state("critical")

        t = threading.Thread(target=worker)
        t.start()
        t.join()
        ---

04.状态机实现
    a.复杂状态转换
        状态机的状态转换方法可能互相调用,RLock确保状态转换的原子性和线程安全。
    b.订单状态机
        ---
        import threading
        from enum import Enum

        class OrderState(Enum):
            PENDING = "pending"
            PAID = "paid"
            SHIPPED = "shipped"
            DELIVERED = "delivered"
            CANCELLED = "cancelled"

        class OrderStateMachine:
            def __init__(self, order_id):
                self.order_id = order_id
                self.state = OrderState.PENDING
                self.rlock = threading.RLock()
                self.history = []

            def pay(self):
                with self.rlock:
                    if self.state == OrderState.PENDING:
                        self._transition_to(OrderState.PAID)
                        self._log_event("payment_received")
                        self._check_auto_ship()  # 可能触发发货
                        return True
                    return False

            def ship(self):
                with self.rlock:
                    if self.state == OrderState.PAID:
                        self._transition_to(OrderState.SHIPPED)
                        self._log_event("order_shipped")
                        return True
                    return False

            def deliver(self):
                with self.rlock:
                    if self.state == OrderState.SHIPPED:
                        self._transition_to(OrderState.DELIVERED)
                        self._log_event("order_delivered")
                        self._send_notification()  # 重入调用
                        return True
                    return False

            def cancel(self):
                with self.rlock:
                    if self.state in [OrderState.PENDING, OrderState.PAID]:
                        self._transition_to(OrderState.CANCELLED)
                        self._log_event("order_cancelled")
                        self._process_refund()  # 重入调用
                        return True
                    return False

            def _transition_to(self, new_state):
                with self.rlock:  # 重入
                    old_state = self.state
                    self.state = new_state
                    print(f"订单{self.order_id}: {old_state.value} -> {new_state.value}")

            def _log_event(self, event):
                with self.rlock:  # 重入
                    self.history.append(event)

            def _check_auto_ship(self):
                with self.rlock:  # 重入
                    print(f"订单{self.order_id}: 检查是否自动发货")

            def _send_notification(self):
                with self.rlock:  # 重���
                    print(f"订单{self.order_id}: 发送通知")

            def _process_refund(self):
                with self.rlock:  # 重入
                    if OrderState.PAID in [OrderState.PAID]:
                        print(f"订单{self.order_id}: 处理退款")

        order = OrderStateMachine("ORD001")

        def process_order():
            order.pay()
            order.ship()
            order.deliver()

        t = threading.Thread(target=process_order)
        t.start()
        t.join()

        print(f"订单历史: {order.history}")
        ---

2.3 锁计数器机制

01.计数器工作原理
    a.内部实现
        RLock内部维护两个关键属性:持有线程ID和递归计数器。当线程首次获取锁时记录线程ID并将计数器设为1,同一线程再次获取时计数器递增,释放时计数器递减,减到0时清除线程ID并释放锁。
    b.计数器状态跟踪
        ---
        import threading

        class RLockMonitor:
            def __init__(self):
                self.rlock = threading.RLock()
                self.acquire_count = 0

            def acquire_with_log(self, level):
                self.rlock.acquire()
                self.acquire_count += 1
                print(f"Level {level}: 获取锁 (总获取次数: {self.acquire_count})")

            def release_with_log(self, level):
                self.acquire_count -= 1
                print(f"Level {level}: 释放锁 (剩余持有: {self.acquire_count})")
                self.rlock.release()

            def nested_operation(self, depth):
                if depth <= 0:
                    return
                self.acquire_with_log(depth)
                try:
                    print(f"  执行操作 depth={depth}")
                    self.nested_operation(depth - 1)
                finally:
                    self.release_with_log(depth)

        monitor = RLockMonitor()
        monitor.nested_operation(4)
        print(f"最终状态: 所有锁已释放")
        ---

02.acquire和release配对
    a.配对规则
        每次acquire()必须对应一次release(),计数器才能正确递减。不配对会导致锁永远无法释放,其他线程永久阻塞。
    b.错误示例与修正
        ---
        import threading

        rlock = threading.RLock()

        def bad_practice():
            rlock.acquire()
            rlock.acquire()
            print("获取锁两次")
            rlock.release()
            # 错误:只释放一次,锁未完全释放
            print("锁状态异常")

        def good_practice():
            rlock.acquire()
            try:
                rlock.acquire()
                try:
                    print("获取锁两次")
                finally:
                    rlock.release()
            finally:
                rlock.release()
            print("锁正确释放")

        # bad_practice()  # 会导致锁泄漏
        good_practice()

        # 验证锁已释放
        if rlock.acquire(blocking=False):
            print("锁可用")
            rlock.release()
        ---

03.计数器溢出保护
    a.最大递归深度
        虽然Python的RLock理论上支持无限递归,但实际受限于系统资源和Python递归深度限制,通常建议递归深度不超过1000层。
    b.深度限制示例
        ---
        import threading
        import sys

        rlock = threading.RLock()
        max_depth = 0

        def test_max_depth(depth):
            global max_depth
            if depth > max_depth:
                max_depth = depth
            try:
                rlock.acquire()
                try:
                    if depth < 500:  # 限制测试深度
                        test_max_depth(depth + 1)
                finally:
                    rlock.release()
            except RecursionError:
                print(f"达到递归限制: {depth}")

        # 设置较小的递归限制用于测试
        old_limit = sys.getrecursionlimit()
        sys.setrecursionlimit(600)

        test_max_depth(0)
        print(f"成功测试深度: {max_depth}")

        sys.setrecursionlimit(old_limit)
        ---

04.性能影响分析
    a.开销对比
        RLock比Lock有额外开销,需要维护线程ID和计数器,每次操作都要检查线程身份。在不需要重入的场景下,Lock性能更优。
    b.性能测试
        ---
        import threading
        import time

        def benchmark_lock(lock_type, iterations):
            if lock_type == "Lock":
                lock = threading.Lock()
            else:
                lock = threading.RLock()

            start = time.time()
            for _ in range(iterations):
                lock.acquire()
                lock.release()
            elapsed = time.time() - start

            return elapsed

        iterations = 1000000

        lock_time = benchmark_lock("Lock", iterations)
        rlock_time = benchmark_lock("RLock", iterations)

        print(f"Lock耗时: {lock_time:.4f}秒")
        print(f"RLock耗时: {rlock_time:.4f}秒")
        print(f"性能差异: {(rlock_time/lock_time - 1) * 100:.2f}%")
        ---

2.4 RLock与Lock对比

01.功能差异
    a.重入支持
        Lock不支持重入,同一线程重复获取会死锁。RLock支持重入,同一线程可多次获取,适合递归和嵌套调用场景。
    b.对比演示
        ---
        import threading

        lock = threading.Lock()
        rlock = threading.RLock()

        def test_lock_reentry():
            print("=== Lock重入测试 ===")
            lock.acquire()
            print("Lock: 第一次获取成功")
            # lock.acquire()  # 取消注释会死锁
            # print("Lock: 第二次获取")  # 永远不会执行
            lock.release()

        def test_rlock_reentry():
            print("\n=== RLock重入测试 ===")
            rlock.acquire()
            print("RLock: 第一次获取成功")
            rlock.acquire()
            print("RLock: 第二次获取成功(重入)")
            rlock.release()
            rlock.release()
            print("RLock: 全部释放")

        test_lock_reentry()
        test_rlock_reentry()
        ---

02.性能对比
    a.执行效率
        Lock实现简单,性能更优。RLock需要维护线程ID和计数器,有额外开销,通常比Lock慢10-30%。
    b.性能基准测试
        ---
        import threading
        import time

        def benchmark_simple_operations():
            iterations = 500000

            # Lock测试
            lock = threading.Lock()
            start = time.time()
            for _ in range(iterations):
                with lock:
                    pass
            lock_time = time.time() - start

            # RLock测试
            rlock = threading.RLock()
            start = time.time()
            for _ in range(iterations):
                with rlock:
                    pass
            rlock_time = time.time() - start

            print(f"Lock: {lock_time:.4f}秒")
            print(f"RLock: {rlock_time:.4f}秒")
            print(f"RLock慢: {(rlock_time/lock_time - 1) * 100:.1f}%")

        def benchmark_nested_operations():
            iterations = 100000

            # RLock嵌套测试
            rlock = threading.RLock()
            start = time.time()
            for _ in range(iterations):
                with rlock:
                    with rlock:
                        with rlock:
                            pass
            nested_time = time.time() - start

            print(f"\nRLock嵌套3层: {nested_time:.4f}秒")

        benchmark_simple_operations()
        benchmark_nested_operations()
        ---

03.使用场景选择
    a.选择Lock的场景
        简单的临界区保护,无递归调用,无方法间嵌套调用,追求最佳性能。
    b.选择RLock的场景
        递归函数需要加锁,类方法互相调用且都需要加锁,回调函数可能触发其他加锁操作,状态机等复杂逻辑。
    c.场景对比示例
        ---
        import threading

        # 场景1:简单计数器 - 使用Lock
        class SimpleCounter:
            def __init__(self):
                self.lock = threading.Lock()  # 使用Lock
                self.value = 0

            def increment(self):
                with self.lock:
                    self.value += 1

            def get_value(self):
                with self.lock:
                    return self.value

        # 场景2:复杂计数器 - 使用RLock
        class ComplexCounter:
            def __init__(self):
                self.rlock = threading.RLock()  # 使用RLock
                self.value = 0
                self.history = []

            def increment(self):
                with self.rlock:
                    self.value += 1
                    self._log_change("increment")  # 调用其他方法

            def decrement(self):
                with self.rlock:
                    self.value -= 1
                    self._log_change("decrement")  # 调用其他方法

            def _log_change(self, operation):
                with self.rlock:  # 重入
                    self.history.append((operation, self.value))

            def get_stats(self):
                with self.rlock:
                    return {
                        "value": self.value,
                        "changes": len(self.history)
                    }

        # 测试
        simple = SimpleCounter()
        complex_counter = ComplexCounter()

        def test_simple():
            for _ in range(100):
                simple.increment()

        def test_complex():
            for _ in range(50):
                complex_counter.increment()
                complex_counter.decrement()

        t1 = threading.Thread(target=test_simple)
        t2 = threading.Thread(target=test_complex)

        t1.start()
        t2.start()
        t1.join()
        t2.join()

        print(f"简单计数器: {simple.get_value()}")
        print(f"复杂计数器: {complex_counter.get_stats()}")
        ---

04.最佳实践建议
    a.默认选择原则
        优先使用Lock,仅在确实需要重入时才使用RLock,避免过度设计。
    b.代码重构建议
        a.避免重入需求
            通过重构代码结构,将需要加锁的逻辑提取到独立方法,避免嵌套调用,从而使用Lock代替RLock。
        b.重构示例
            ---
            import threading

            # 不推荐:使用RLock处理嵌套
            class BadDesign:
                def __init__(self):
                    self.rlock = threading.RLock()
                    self.data = []

                def add(self, item):
                    with self.rlock:
                        self.data.append(item)
                        self._update_stats()  # 嵌套调用

                def _update_stats(self):
                    with self.rlock:  # 重入
                        print(f"当前数量: {len(self.data)}")

            # 推荐:重构避免重入
            class GoodDesign:
                def __init__(self):
                    self.lock = threading.Lock()  # 使用Lock
                    self.data = []

                def add(self, item):
                    with self.lock:
                        self.data.append(item)
                        count = len(self.data)  # 在锁内获取数据
                    # 在锁外执行其他操作
                    print(f"当前数量: {count}")

            # 测试
            good = GoodDesign()
            threads = [threading.Thread(target=good.add, args=(i,)) for i in range(5)]
            for t in threads:
                t.start()
            for t in threads:
                t.join()
            ---
    c.性能优化策略
        在性能敏感的代码中,如果可以通过设计避免重入,优先使用Lock。如果必须使用RLock,尽量减少嵌套深度和锁持有时间。

3. 条件变量

3.1 threading.Condition

01.基本概念
    a.定义
        Condition条件变量是一种高级同步原语,允许线程在某个条件满足前等待,条件满足时被其他线程唤醒。Condition内部关联一个锁,提供wait()、notify()、notifyAll()方法实现线程间协调。
    b.工作原理
        线程调用wait()时释放锁并进入等待状态,其他线程修改共享状态后调用notify()或notifyAll()唤醒等待线程,被唤醒的线程重新获取锁并检查条件是否满足。

02.创建与使用
    a.创建Condition对象
        a.基本创建
            使用threading.Condition()创建条件变量,可选传入Lock或RLock对象,不传则自动创建RLock。
        b.代码示例
            ---
            import threading

            # 方式1:自动创建锁
            condition1 = threading.Condition()

            # 方式2:使用自定义锁
            lock = threading.Lock()
            condition2 = threading.Condition(lock)

            # 方式3:使用RLock
            rlock = threading.RLock()
            condition3 = threading.Condition(rlock)

            # 基本使用示例
            shared_data = []
            condition = threading.Condition()

            def producer():
                with condition:
                    shared_data.append(1)
                    print("生产者:添加数据")
                    condition.notify()  # 通知消费者

            def consumer():
                with condition:
                    while not shared_data:
                        print("消费者:等待数据")
                        condition.wait()  # 等待通知
                    data = shared_data.pop()
                    print(f"消费者:获取数据 {data}")

            t1 = threading.Thread(target=consumer)
            t2 = threading.Thread(target=producer)

            t1.start()
            t2.start()
            t1.join()
            t2.join()
            ---

03.核心方法
    a.wait方法
        a.功能说明
            wait()释放锁并阻塞当前线程,直到被notify()或notifyAll()唤醒,或超时。被唤醒后重新获取锁。
        b.代码示例
            ---
            import threading
            import time

            condition = threading.Condition()
            ready = False

            def waiter():
                with condition:
                    print("等待线程:开始等待")
                    while not ready:
                        condition.wait()  # 释放锁并等待
                    print("等待线程:条件满足,继续执行")

            def notifier():
                global ready
                time.sleep(1)
                with condition:
                    ready = True
                    print("通知线程:设置条件并通知")
                    condition.notify()

            t1 = threading.Thread(target=waiter)
            t2 = threading.Thread(target=notifier)

            t1.start()
            t2.start()
            t1.join()
            t2.join()
            ---
    b.wait超时机制
        a.功能说明
            wait(timeout)设置等待超时时间,超时后自动返回False,避免无限等待。
        b.代码示例
            ---
            import threading
            import time

            condition = threading.Condition()
            data_ready = False

            def wait_with_timeout():
                with condition:
                    print("开始等待(超时2秒)")
                    result = condition.wait(timeout=2)
                    if result:
                        print("被通知唤醒")
                    else:
                        print("等待超时")

            def late_notifier():
                time.sleep(3)  # 延迟3秒,超过超时时间
                with condition:
                    global data_ready
                    data_ready = True
                    condition.notify()

            t1 = threading.Thread(target=wait_with_timeout)
            t2 = threading.Thread(target=late_notifier)

            t1.start()
            t2.start()
            t1.join()
            t2.join()
            # 输出: 等待超时
            ---

04.通知机制
    a.notify方法
        notify()唤醒一个等待的线程,如果有多个线程等待,只唤醒其中一个,具体哪个由系统决定。
    b.notifyAll方法
        notifyAll()唤醒所有等待的线程,所有线程竞争锁,获取锁的线程继续执行。
    c.对比示例
        ---
        import threading
        import time

        condition = threading.Condition()
        counter = 0

        def worker(worker_id):
            with condition:
                print(f"工作线程{worker_id}:等待通知")
                condition.wait()
                print(f"工作线程{worker_id}:被唤醒,counter={counter}")

        def notify_one():
            global counter
            time.sleep(1)
            with condition:
                counter = 1
                print("主线程:notify() 唤醒一个线程")
                condition.notify()  # 只唤醒一个

        def notify_all():
            global counter
            time.sleep(1)
            with condition:
                counter = 2
                print("主线程:notifyAll() 唤醒所有线程")
                condition.notify_all()  # 唤醒所有

        # 测试notify()
        print("=== 测试notify() ===")
        threads = [threading.Thread(target=worker, args=(i,)) for i in range(3)]
        for t in threads:
            t.start()

        notify_one()
        time.sleep(0.5)

        # 剩余线程仍在等待,用notifyAll唤醒
        notify_all()

        for t in threads:
            t.join()
        ---

3.2 wait与notify

01.wait方法详解
    a.等待循环模式
        使用while循环检查条件而非if,因为wait()可能被虚假唤醒,必须重新检查条件是否真正满足。
    b.正确使用模式
        ---
        import threading
        import time

        condition = threading.Condition()
        queue = []
        MAX_SIZE = 5

        def producer(item_count):
            for i in range(item_count):
                with condition:
                    while len(queue) >= MAX_SIZE:
                        print(f"生产者:队列满,等待")
                        condition.wait()
                    queue.append(i)
                    print(f"生产者:生产 {i}, 队列长度 {len(queue)}")
                    condition.notify()
                time.sleep(0.1)

        def consumer(item_count):
            for _ in range(item_count):
                with condition:
                    while not queue:
                        print(f"消费者:队列空,等待")
                        condition.wait()
                    item = queue.pop(0)
                    print(f"消费者:消费 {item}, 队列长度 {len(queue)}")
                    condition.notify()
                time.sleep(0.2)

        t1 = threading.Thread(target=producer, args=(10,))
        t2 = threading.Thread(target=consumer, args=(10,))

        t1.start()
        t2.start()
        t1.join()
        t2.join()
        ---

02.notify使用技巧
    a.notify时机
        在修改共享状态后立即调用notify(),确保等待线程能及时感知状态变化。必须在持有锁的情况下调用notify()。
    b.选择notify还是notifyAll
        ---
        import threading
        import time

        condition = threading.Condition()
        tasks = []

        def worker(worker_id, task_type):
            with condition:
                while True:
                    matching_tasks = [t for t in tasks if t["type"] == task_type]
                    if not matching_tasks:
                        print(f"工作者{worker_id}({task_type}):无任务,等待")
                        condition.wait(timeout=2)
                        if not tasks:
                            break
                        continue
                    task = matching_tasks[0]
                    tasks.remove(task)
                    print(f"工作者{worker_id}:处理任务 {task}")
                    break

        def add_task(task_type, use_notify_all=False):
            with condition:
                tasks.append({"type": task_type, "id": len(tasks)})
                print(f"添加任务:{task_type}")
                if use_notify_all:
                    condition.notify_all()  # 唤醒所有
                else:
                    condition.notify()  # 只唤醒一个

        # 场景1:使用notify()
        print("=== 使用notify() ===")
        t1 = threading.Thread(target=worker, args=(1, "A"))
        t2 = threading.Thread(target=worker, args=(2, "B"))
        t1.start()
        t2.start()
        time.sleep(0.5)

        add_task("A", use_notify_all=False)
        add_task("B", use_notify_all=False)

        t1.join()
        t2.join()
        ---

03.虚假唤醒处理
    a.虚假唤醒原因
        操作系统可能在没有notify()调用的情况下唤醒等待线程,或多个线程竞争同一资源时,被唤醒的线程发现条件已被其他线程改变。
    b.防御性编程
        ---
        import threading
        import time
        import random

        condition = threading.Condition()
        resource_available = False
        spurious_wakeup_count = 0

        def waiter(waiter_id):
            global spurious_wakeup_count
            with condition:
                wakeup_count = 0
                while not resource_available:
                    print(f"等待者{waiter_id}:开始等待")
                    condition.wait()
                    wakeup_count += 1
                    if not resource_available:
                        spurious_wakeup_count += 1
                        print(f"等待者{waiter_id}:虚假唤醒 #{wakeup_count}")
                print(f"等待者{waiter_id}:获取资源(唤醒{wakeup_count}次)")

        def provider():
            global resource_available
            time.sleep(2)
            with condition:
                resource_available = True
                print("提供者:资源就绪,通知所有等待者")
                condition.notify_all()

        threads = [threading.Thread(target=waiter, args=(i,)) for i in range(3)]
        for t in threads:
            t.start()

        provider_thread = threading.Thread(target=provider)
        provider_thread.start()

        for t in threads:
            t.join()
        provider_thread.join()

        print(f"总虚假唤醒次数: {spurious_wakeup_count}")
        ---

04.超时等待模式
    a.带超时的等待
        使用wait(timeout)避免无限等待,适用于需要定期检查或有时间限制的场景。
    b.超时重试模式
        ---
        import threading
        import time

        condition = threading.Condition()
        data = None

        def wait_for_data(max_wait=5):
            with condition:
                start_time = time.time()
                while data is None:
                    remaining = max_wait - (time.time() - start_time)
                    if remaining <= 0:
                        print("等待超时,数据未就绪")
                        return None
                    print(f"等待数据(剩余{remaining:.1f}秒)")
                    condition.wait(timeout=min(remaining, 1))
                print(f"获取数据: {data}")
                return data

        def provide_data_late():
            global data
            time.sleep(3)
            with condition:
                data = "重要数据"
                print("提供数据")
                condition.notify()

        t1 = threading.Thread(target=wait_for_data)
        t2 = threading.Thread(target=provide_data_late)

        t1.start()
        t2.start()
        t1.join()
        t2.join()
        ---

3.3 生产者消费者模式

01.经典生产者消费者
    a.模式定义
        生产者线程生产数据放入缓冲区,消费者线程从缓冲区取出数据消费。使用Condition协调生产者和消费者,当缓冲区满时生产者等待,缓冲区空时消费者等待。
    b.基础实现
        ---
        import threading
        import time
        import random

        class ProducerConsumer:
            def __init__(self, max_size=10):
                self.condition = threading.Condition()
                self.queue = []
                self.max_size = max_size

            def produce(self, producer_id, count):
                for i in range(count):
                    with self.condition:
                        while len(self.queue) >= self.max_size:
                            print(f"生产者{producer_id}:队列满,等待")
                            self.condition.wait()
                        item = f"P{producer_id}-{i}"
                        self.queue.append(item)
                        print(f"生产者{producer_id}:生产 {item}, 队列 {len(self.queue)}/{self.max_size}")
                        self.condition.notify()
                    time.sleep(random.uniform(0.1, 0.3))

            def consume(self, consumer_id, count):
                for _ in range(count):
                    with self.condition:
                        while not self.queue:
                            print(f"消费者{consumer_id}:队列空,等待")
                            self.condition.wait()
                        item = self.queue.pop(0)
                        print(f"消费者{consumer_id}:消费 {item}, 队列 {len(self.queue)}/{self.max_size}")
                        self.condition.notify()
                    time.sleep(random.uniform(0.2, 0.4))

        pc = ProducerConsumer(max_size=5)

        producers = [threading.Thread(target=pc.produce, args=(i, 10)) for i in range(2)]
        consumers = [threading.Thread(target=pc.consume, args=(i, 10)) for i in range(2)]

        for t in producers + consumers:
            t.start()
        for t in producers + consumers:
            t.join()

        print("生产消费完成")
        ---

02.多生产者多消费者
    a.复杂场景处理
        多个生产者和消费者并发操作,需要使用notifyAll()确保所有等待线程都有机会被唤醒,避免死锁。
    b.优化实现
        ---
        import threading
        import time
        import queue

        class MultiProducerConsumer:
            def __init__(self, max_size=10):
                self.condition = threading.Condition()
                self.queue = []
                self.max_size = max_size
                self.producer_count = 0
                self.consumer_count = 0
                self.done = False

            def produce(self, producer_id, items):
                self.producer_count += 1
                try:
                    for item in items:
                        with self.condition:
                            while len(self.queue) >= self.max_size and not self.done:
                                self.condition.wait()
                            if self.done:
                                break
                            self.queue.append(item)
                            print(f"P{producer_id}: +{item} [{len(self.queue)}]")
                            self.condition.notify_all()
                        time.sleep(0.05)
                finally:
                    self.producer_count -= 1
                    if self.producer_count == 0:
                        with self.condition:
                            self.done = True
                            self.condition.notify_all()

            def consume(self, consumer_id):
                self.consumer_count += 1
                try:
                    while True:
                        with self.condition:
                            while not self.queue and not self.done:
                                self.condition.wait()
                            if not self.queue and self.done:
                                break
                            if self.queue:
                                item = self.queue.pop(0)
                                print(f"C{consumer_id}: -{item} [{len(self.queue)}]")
                                self.condition.notify_all()
                        time.sleep(0.1)
                finally:
                    self.consumer_count -= 1

        mpc = MultiProducerConsumer(max_size=5)

        producers = [
            threading.Thread(target=mpc.produce, args=(i, range(i*10, (i+1)*10)))
            for i in range(3)
        ]
        consumers = [
            threading.Thread(target=mpc.consume, args=(i,))
            for i in range(4)
        ]

        for t in producers + consumers:
            t.start()
        for t in producers + consumers:
            t.join()

        print("多生产者消费者完成")
        ---

03.优先级队列模式
    a.带优先级的生产消费
        消费者优先处理高优先级任务,使用Condition配合优先级队列实现任务调度。
    b.实现示例
        ---
        import threading
        import time
        import heapq

        class PriorityProducerConsumer:
            def __init__(self):
                self.condition = threading.Condition()
                self.heap = []
                self.counter = 0

            def produce(self, priority, task):
                with self.condition:
                    heapq.heappush(self.heap, (priority, self.counter, task))
                    self.counter += 1
                    print(f"生产任务: 优先级{priority}, {task}")
                    self.condition.notify()

            def consume(self, consumer_id):
                while True:
                    with self.condition:
                        while not self.heap:
                            if not self.condition.wait(timeout=2):
                                return
                        priority, _, task = heapq.heappop(self.heap)
                        print(f"消费者{consumer_id}: 处理优先级{priority}, {task}")
                    time.sleep(0.5)

        ppc = PriorityProducerConsumer()

        def producer_work():
            tasks = [(1, "低优先级任务1"), (5, "高优先级任务1"),
                     (3, "中优先级任务1"), (5, "高优先级任务2"),
                     (1, "低优先级任务2")]
            for priority, task in tasks:
                ppc.produce(priority, task)
                time.sleep(0.2)

        producer = threading.Thread(target=producer_work)
        consumers = [threading.Thread(target=ppc.consume, args=(i,)) for i in range(2)]

        producer.start()
        for c in consumers:
            c.start()

        producer.join()
        for c in consumers:
            c.join()
        ---

04.批量处理模式
    a.批量生产消费
        生产者批量生产数据,消费者批量消费,减少锁竞争,提高吞吐量。
    b.批处理实现
        ---
        import threading
        import time

        class BatchProducerConsumer:
            def __init__(self, batch_size=5):
                self.condition = threading.Condition()
                self.queue = []
                self.batch_size = batch_size

            def produce_batch(self, producer_id, total_items):
                batch = []
                for i in range(total_items):
                    batch.append(f"P{producer_id}-{i}")
                    if len(batch) >= self.batch_size:
                        with self.condition:
                            self.queue.extend(batch)
                            print(f"生产者{producer_id}: 批量生产{len(batch)}项")
                            batch = []
                            self.condition.notify_all()
                        time.sleep(0.1)

                if batch:
                    with self.condition:
                        self.queue.extend(batch)
                        print(f"生产者{producer_id}: 最后批次{len(batch)}项")
                        self.condition.notify_all()

            def consume_batch(self, consumer_id, total_items):
                consumed = 0
                while consumed < total_items:
                    with self.condition:
                        while not self.queue:
                            self.condition.wait(timeout=1)
                            if not self.queue:
                                return
                        batch_size = min(self.batch_size, len(self.queue))
                        batch = self.queue[:batch_size]
                        self.queue = self.queue[batch_size:]
                        print(f"消费者{consumer_id}: 批量消费{len(batch)}项")
                        consumed += len(batch)
                        self.condition.notify_all()
                    time.sleep(0.2)

        bpc = BatchProducerConsumer(batch_size=5)

        producers = [threading.Thread(target=bpc.produce_batch, args=(i, 20)) for i in range(2)]
        consumers = [threading.Thread(target=bpc.consume_batch, args=(i, 20)) for i in range(2)]

        for t in producers + consumers:
            t.start()
        for t in producers + consumers:
            t.join()

        print("批量处理完成")
        ---

3.4 条件变量应用场景

01.线程池任务调度
    a.任务队列管理
        使用Condition实现线程池的任务队列,工作线程等待任务,主线程提交任务后通知工作线程。
    b.线程池实现
        ---
        import threading
        import time

        class ThreadPool:
            def __init__(self, num_workers=4):
                self.condition = threading.Condition()
                self.tasks = []
                self.shutdown = False
                self.workers = []
                for i in range(num_workers):
                    t = threading.Thread(target=self._worker, args=(i,))
                    t.start()
                    self.workers.append(t)

            def _worker(self, worker_id):
                while True:
                    with self.condition:
                        while not self.tasks and not self.shutdown:
                            self.condition.wait()
                        if self.shutdown and not self.tasks:
                            break
                        task = self.tasks.pop(0)
                    print(f"工作线程{worker_id}: 执行任务 {task}")
                    time.sleep(0.5)

            def submit(self, task):
                with self.condition:
                    if self.shutdown:
                        raise RuntimeError("线程池已关闭")
                    self.tasks.append(task)
                    self.condition.notify()

            def shutdown_pool(self):
                with self.condition:
                    self.shutdown = True
                    self.condition.notify_all()
                for worker in self.workers:
                    worker.join()

        pool = ThreadPool(num_workers=3)

        for i in range(10):
            pool.submit(f"任务{i}")
            time.sleep(0.1)

        pool.shutdown_pool()
        print("线程池关闭")
        ---

02.资源池管理
    a.连接池实现
        使用Condition管理数据库连接池,当连接不足时等待,连接归还后通知等待线程。
    b.连接池示例
        ---
        import threading
        import time

        class ConnectionPool:
            def __init__(self, max_connections=5):
                self.condition = threading.Condition()
                self.max_connections = max_connections
                self.available = list(range(max_connections))
                self.in_use = set()

            def acquire(self, timeout=None):
                with self.condition:
                    start_time = time.time()
                    while not self.available:
                        if timeout:
                            remaining = timeout - (time.time() - start_time)
                            if remaining <= 0:
                                raise TimeoutError("获取连接超时")
                            self.condition.wait(timeout=remaining)
                        else:
                            self.condition.wait()
                    conn = self.available.pop()
                    self.in_use.add(conn)
                    print(f"获取连接{conn}, 可用:{len(self.available)}, 使用中:{len(self.in_use)}")
                    return conn

            def release(self, conn):
                with self.condition:
                    if conn not in self.in_use:
                        raise ValueError("连接未被使用")
                    self.in_use.remove(conn)
                    self.available.append(conn)
                    print(f"释放连接{conn}, 可用:{len(self.available)}, 使用中:{len(self.in_use)}")
                    self.condition.notify()

        pool = ConnectionPool(max_connections=3)

        def worker(worker_id):
            try:
                conn = pool.acquire(timeout=2)
                time.sleep(1)
                pool.release(conn)
            except TimeoutError as e:
                print(f"工作线程{worker_id}: {e}")

        threads = [threading.Thread(target=worker, args=(i,)) for i in range(6)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        ---

03.事件通知系统
    a.发布订阅模式
        使用Condition实现事件通知,订阅者等待特定事件,发布者触发事件后通知所有订阅者。
    b.事件系统实现
        ---
        import threading
        import time

        class EventBus:
            def __init__(self):
                self.condition = threading.Condition()
                self.events = {}
                self.subscribers = {}

            def subscribe(self, event_type, subscriber_id):
                with self.condition:
                    if event_type not in self.subscribers:
                        self.subscribers[event_type] = set()
                    self.subscribers[event_type].add(subscriber_id)

            def wait_for_event(self, event_type, subscriber_id, timeout=None):
                with self.condition:
                    while event_type not in self.events:
                        if not self.condition.wait(timeout=timeout):
                            return None
                    event_data = self.events[event_type]
                    print(f"订阅者{subscriber_id}: 收到事件 {event_type} = {event_data}")
                    return event_data

            def publish(self, event_type, data):
                with self.condition:
                    self.events[event_type] = data
                    print(f"发布事件: {event_type} = {data}")
                    self.condition.notify_all()

        bus = EventBus()

        def subscriber(sub_id, event_type):
            bus.subscribe(event_type, sub_id)
            bus.wait_for_event(event_type, sub_id, timeout=3)

        def publisher():
            time.sleep(1)
            bus.publish("user_login", {"user": "alice", "time": time.time()})
            time.sleep(0.5)
            bus.publish("data_ready", {"records": 100})

        subs = [
            threading.Thread(target=subscriber, args=(1, "user_login")),
            threading.Thread(target=subscriber, args=(2, "user_login")),
            threading.Thread(target=subscriber, args=(3, "data_ready"))
        ]
        pub = threading.Thread(target=publisher)

        for s in subs:
            s.start()
        pub.start()

        for s in subs:
            s.join()
        pub.join()
        ---

04.状态同步场景
    a.多阶段任务协调
        多个线程执行多阶段任务,使用Condition在每个阶段同步,确保所有线程完成当前阶段后再进入下一阶段。
    b.阶段同步实现
        ---
        import threading
        import time

        class PhaseBarrier:
            def __init__(self, num_threads):
                self.condition = threading.Condition()
                self.num_threads = num_threads
                self.current_phase = 0
                self.waiting_count = 0

            def wait_phase(self, thread_id, phase):
                with self.condition:
                    while self.current_phase < phase:
                        self.condition.wait()
                    print(f"线程{thread_id}: 进入阶段{phase}")

            def complete_phase(self, thread_id, phase):
                with self.condition:
                    self.waiting_count += 1
                    print(f"线程{thread_id}: 完成阶段{phase} ({self.waiting_count}/{self.num_threads})")
                    if self.waiting_count == self.num_threads:
                        self.current_phase += 1
                        self.waiting_count = 0
                        print(f"所有线程完成阶段{phase}, 进入阶段{self.current_phase}")
                        self.condition.notify_all()

        barrier = PhaseBarrier(num_threads=3)

        def worker(worker_id):
            for phase in range(3):
                barrier.wait_phase(worker_id, phase)
                print(f"线程{worker_id}: 执行阶段{phase}任务")
                time.sleep(1)
                barrier.complete_phase(worker_id, phase)

        threads = [threading.Thread(target=worker, args=(i,)) for i in range(3)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()

        print("所有阶段完成")
        ---

4. 信号量

4.1 threading.Semaphore

01.Semaphore基础概念
    a.定义与作用
        Semaphore是Python threading模块中提供的信号量同步原语,用于控制同时访问特定资源的线程数量。
        它通过内部计数器机制实现资源访问控制,允许多个线程同时访问资源,但限制并发数量。
    b.核心特性
        a.计数器机制
            信号量内部维护一个整数计数器,表示可用的资源许可数量。
        b.获取与释放
            线程通过acquire()方法获取许可,通过release()方法释放许可。
        c.阻塞机制
            当计数器为0时,后续的acquire()操作会阻塞,直到有线程释放许可。

02.创建Semaphore对象
    a.初始化参数
        threading.Semaphore(value=1)
        value参数指定初始许可数量,默认为1,必须为非负整数。
    b.参数选择原则
        a.资源数量
            根据实际可用的物理资源数量设置,如数据库连接数、文件句柄数等。
        b.系统负载
            考虑系统承载能力和预期并发量,避免设置过高导致资源竞争。
        c.性能平衡
            在并发性能和资源利用率之间找到最佳平衡点。

03.核心方法详解
    a.acquire()方法
        a.功能说明
            获取信号量的许可,如果可用许可数大于0,则立即返回并减少计数器。
            如果可用许可数为0,则阻塞直到有许可可用或超时。
        b.阻塞行为
            默认情况下会无限期阻塞,可通过timeout参数设置超时时间。
        c.返回值
            成功获取许可返回True,超时返回False。
    b.release()方法
        a.功能说明
            释放信号量的许可,增加内部计数器的值,唤醒等待的线程。
        b.计数器上限
            标准Semaphore没有计数器上限,可以超过初始值。
        c.异常处理
            如果试图释放超过获取次数的许可,计数器会持续增加。
    c.__enter__和__exit__方法
        a.上下文管理器支持
            支持with语句,自动管理许可的获取和释放。
        b.异常安全性
            即使发生异常,也能确保许可被正确释放。

04.工作原理机制
    a.内部计数器
        a.初始值设置
            初始化时设置可用许可数量,控制最大并发数。
        b.原子操作
            计数器的增减操作是原子的,保证多线程环境下的数据一致性。
    b.等待队列管理
        a.FIFO队列
            等待获取许可的线程按照先进先出顺序排队。
        b.线程唤醒
            当有许可释放时,按顺序唤醒等待队列中的线程。
    c.线程调度
        a.阻塞与唤醒
            获取许可失败的线程会被操作系统挂起,释放许可时重新调度。
        b.上下文切换
            频繁的信号量操作可能导致线程上下文切换开销。

05.应用场景分析
    a.资源池管理
        a.数据库连接池
            限制同时访问数据库的连接数,防止连接过多导致数据库崩溃。
        b.线程池控制
            控制并发执行的线程数量,避免系统资源耗尽。
        c.文件句柄限制
            限制同时打开的文件数量,避免文件描述符耗尽。
    b.流量控制
        a.API访问限制
            控制对第三方API的并发请求频率,避免触发限流机制。
        b.带宽管理
            限制同时进行的网络传输数量,控制网络带宽使用。
    c.并发计算
        a.批处理控制
            控制同时处理的数据批次数量,平衡内存使用和处理速度。
        b.任务调度
            限制同时执行的任务数量,保证系统稳定性。

06.性能特点与优化
    a.内存开销
        a.轻量级对象
            Semaphore对象本身占用内存很小,主要开销在等待队列管理。
        b.线程存储
            每个等待的线程需要在队列中保存状态信息。
    b.CPU使用率
        a.非竞争情况
            无竞争时获取释放许可的开销很小,主要是原子操作成本。
        b.竞争情况
            高竞争时线程阻塞和唤醒会产生显著的CPU开销。
    c.优化策略
        a.合理设置许可数
            根据实际测试调整许可数量,找到性能最优值。
        b.减少竞争粒度
            将大锁拆分为多个小信号量,减少线程竞争。
        c.批量操作
            考虑批量获取许可,减少信号量操作频率。

07.使用注意事项
    a.死锁预防
        a.避免嵌套获取
            不要在已持有信号量的线程中再次获取同一个信号量。
        b.获取顺序一致
            多个信号量获取时,保持一致的获取顺序避免循环等待。
    b.资源泄漏防护
        a.确保释放
            使用try-finally或with语句确保许可被正确释放。
        b.异常处理
            在异常情况下也要保证许可的释放,避免资源泄漏。
    c.计数器管理
        a.避免过度释放
            标准Semaphore允许过度释放,但可能导致逻辑错误。
        b.状态监控
            定期检查信号量状态,确保计数器在合理范围内。

4.2 BoundedSemaphore

01.BoundedSemaphore基础概念
    a.定义与特点
        BoundedSemaphore是Semaphore的一个特殊变体,增加了计数器上限控制机制。
        与标准Semaphore不同,BoundedSemaphore的内部计数器不能超过初始设定值。
    b.核心区别
        a.计数器上限
            标准Semaphore允许计数器超过初始值,BoundedSemaphore严格限制在初始值内。
        b.错误检测
            当试图释放超过初始值的许可时,BoundedSemaphore会抛出ValueError异常。
        c.安全保证
            提供更强的编程安全保证,防止程序逻辑错误导致的许可泄漏。

02.创建BoundedSemaphore对象
    a.构造方法签名
        threading.BoundedSemaphore(value=1)
        value参数指定初始和最大许可数量,默认为1,必须为非负整数。
    b.初始化特点
        a.双重约束
            初始值既是起始许可数,也是计数器的最大上限值。
        b.参数验证
            构造时会对value参数进行验证,确保为有效非负整数。
    c.使用场景选择
        a.严格资源控制
            需要严格控制资源数量,防止过度分配的场景。
        b.错误敏感应用
            对程序逻辑错误敏感,需要及时发现异常情况的场景。

03.核心方法与异常处理
    a.acquire()方法
        a.功能特性
            与标准Semaphore的acquire()方法行为完全一致。
        b.获取机制
            当计数器大于0时立即获取许可,否则阻塞等待。
        c.超时支持
            支持timeout参数,可设置最大等待时间。
    b.release()方法
        a.核心约束
            只有当前计数器小于初始值时才允许释放许可。
        b.异常机制
            当计数器已达到初始值时调用release()会抛出ValueError异常。
        c.错误信息
            异常信息明确指出释放操作超过了最大允许值。
    c.异常处理策略
        a.捕获异常
            使用try-catch块捕获ValueError,进行适当的错误处理。
        b.日志记录
            记录异常情况,便于程序调试和问题排查。
        c.优雅降级
            在异常情况下采取降级策略,保证程序继续运行。

04.内部工作机制
    a.计数器管理
        a.双向约束
            计数器只能在0到初始值之间变化,不能超出这个范围。
        b.状态检查
            每次release()操作都会检查当前计数器状态。
        c.原子更新
            计数器的状态检查和更新是原子操作,保证线程安全。
    b.状态验证逻辑
        a.释放前检查
            在增加计数器前检查是否已达到上限值。
        b.异常触发
            当检查发现计数器已满时,立即抛出ValueError异常。
        c.线程安全
            状态检查在临界区内执行,避免竞态条件。
    c.与Semaphore的区别
        a.额外检查开销
            每次release()操作都需要额外的状态检查,略增性能开销。
        b.内存使用
            需要额外存储初始值作为上限参考。
        c.代码复杂度
            实现逻辑相对复杂,但提供更强的安全保证。

05.典型应用场景
    a.严格资源管理
        a.数据库连接池
            确保数据库连接数不会超过配置的最大值,防止连接泄漏积累。
        b.内存资源控制
            限制同时占用大量内存的操作数量,防止内存溢出。
        c.文件描述符管理
            严格控制文件描述符使用,避免系统资源耗尽。
    b.并发任务限制
        a.工作线程控制
            限制同时执行的工作线程数量,确保系统稳定性。
        b.API请求限流
            严格控制对外部API的并发请求,避免触发限流惩罚。
        c.批处理作业管理
            限制同时运行的批处理作业数量,平衡系统负载。
    c.错误敏感系统
        a.金融服务系统
            对资源使用异常敏感,需要及时发现和处理异常情况。
        b.实时控制系统
            要求严格的资源控制,确保系统响应的实时性。
        c.嵌入式系统
            资源有限的环境下,需要精确控制资源使用。

06.性能特征分析
    a.执行效率
        a.正常操作开销
            在正常范围内,性能与标准Semaphore基本相同。
        b.异常检测开销
            release()操作的额外检查会增加少量性能开销。
        c.内存占用
            需要额外存储上限值,内存占用略增。
    b.竞争特性
        a.低竞争情况
            在资源充足时,性能影响很小。
        b.高竞争情况
            在资源紧张时,异常检查的开销相对影响较小。
        c.异常频率
            如果频繁出现异常,性能影响会显著增加。
    c.监控指标
        a.计数器状态
            监控计数器的使用情况,评估资源利用率。
        b.异常频率
            统计ValueError异常的发生频率,评估程序健康度。
        c.等待时间
            监控线程等待许可的时间,评估系统性能。

07.最佳实践与建议
    a.参数选择指导
        a.合理设置上限
            根据实际资源容量和系统负载设置合理的许可上限。
        b.预留安全空间
            不要将上限设置得过高,保留一定的安全缓冲空间。
        c.动态调整考虑
            在需要时可以考虑动态调整许可数量。
    b.错误处理模式
        a.防御性编程
            在调用release()前检查程序逻辑,避免异常发生。
        b.异常恢复策略
            设计异常发生时的恢复策略,保证系统稳定性。
        c.监控告警
            设置异常监控和告警机制,及时发现问题。
    c.性能优化建议
        a.减少异常频率
            通过程序优化减少ValueError异常的发生频率。
        b.批量操作
            在可能的情况下使用批量获取和释放操作。
        c.监控调优
            根据监控数据持续优化参数设置和使用策略。

08.常见问题与解决方案
    a.ValueError异常处理
        a.问题原因
            程序逻辑错误导致release()调用次数超过acquire()调用次数。
        b.诊断方法
            通过日志和调试工具分析异常发生的具体位置和原因。
        c.解决方案
            检查程序逻辑,确保acquire()和release()调用匹配。
    b.性能优化问题
        a.识别瓶颈
            通过性能分析工具识别信号量相关的性能瓶颈。
        b.参数调优
            根据实际测试结果调整许可数量和并发策略。
        c.架构优化
            考虑使用更细粒度的信号量或其他同步机制。
    c.资源泄漏检测
        a.监控指标
            设置资源使用监控,及时发现异常的资源使用模式。
        b.定期检查
            定期检查信号量状态,确保计数器在合理范围内。
        c.自动恢复
            在检测到异常时,触发自动恢复机制。

4.3 资源池管理

01.资源池基础概念
    a.定义与作用
        资源池是一种软件设计模式,用于管理有限资源的分配、使用和回收。
        通过预先创建和维护一组资源对象,避免频繁的资源创建和销毁开销。
    b.核心目标
        a.性能优化
            减少资源创建和销毁的时间开销,提高系统响应速度。
        b.资源控制
            限制并发使用的资源数量,防止资源耗尽和系统崩溃。
        c.资源复用
            通过资源复用减少内存分配和垃圾回收压力。
    c.信号量在资源池中的作用
        a.并发控制
            使用信号量控制同时从池中获取资源的线程数量。
        b.容量管理
            信号量计数器对应池中可用资源的数量。
        c.阻塞机制
            当池中无可用资源时,线程会被阻塞等待。

02.资源池基本架构
    a.核心组件
        a.资源容器
            存储可用资源的队列或栈结构,线程安全的数据结构。
        b.信号量控制器
            使用Semaphore或BoundedSemaphore控制资源访问。
        c.资源工厂
            负责创建新资源对象的工厂方法或类。
        d.状态监控器
            监控池的状态信息,如资源使用率、等待队列长度等。
    b.初始化流程
        a.设置池容量
            根据系统资源和应用需求确定池的最大容量。
        b.创建信号量
            初始化信号量计数器为池的容量值。
        c.预分配资源
            可选择预创建部分资源填入池中,减少首次获取延迟。
        d.启动监控线程
            启动后台监控线程,负责资源检查和池维护。
    c.资源获取流程
        a.信号量获取
            首先通过信号量获取访问许可。
        b.资源检查
            从池中获取可用资源,检查资源有效性。
        c.资源分配
            将可用资源分配给请求线程,从池中移除。
        d.失败处理
            当资源无效时的重试或异常处理机制。

03.信号量与资源池的协同机制
    a.计数器同步
        a.资源数量映射
            信号量计数器值等于池中可用资源的实际数量。
        b.原子更新
            资源获取和释放时,同步更新信号量计数器。
        c.状态一致性
            确保信号量状态与资源池状态始终保持一致。
    b.阻塞与唤醒
        a.资源不足阻塞
            当池中无资源时,信号量导致线程阻塞等待。
        b.资源释放唤醒
            当资源归还池中时,信号量唤醒等待的线程。
        c.公平性保证
            通过FIFO队列确保线程按顺序获取资源。
    c.异常安全处理
        a.获取失败回滚
            资源获取失败时,正确释放已获取的信号量许可。
        b.异常情况恢复
            处理资源创建、验证过程中出现的异常情况。
        c.状态一致性维护
            在异常情况下维护信号量和资源池状态的一致性。

04.常见资源池类型
    a.数据库连接池
        a.连接管理
            管理数据库连接的创建、分配、回收和销毁。
        b.连接验证
            定期验证连接的有效性,清理无效连接。
        c.超时控制
            设置连接获取超时时间和连接最大生命周期。
        d.负载均衡
            在多个数据库服务器之间分配连接请求。
    b.线程池
        a.工作线程管理
            维护一组工作线程,避免线程创建销毁开销。
        b.任务队列
            使用队列管理待执行的任务,控制并发度。
        c.动态调整
            根据负载情况动态调整线程池大小。
        d.资源隔离
            为不同类型的任务使用独立的线程池。
    c.内存对象池
        a.对象复用
            复用内存对象,减少垃圾回收压力。
        b.内存预分配
            预先分配大块内存,避免运行时内存分配。
        c.对象状态管理
            管理对象的初始化、清理和状态重置。
        d.内存泄漏防护
            监控对象生命周期,防止内存泄漏。
    d.网络连接池
        a.HTTP连接池
            复用HTTP连接,减少连接建立和关闭开销。
        b.TCP连接池
            管理TCP连接的复用和负载均衡。
        c.连接健康检查
            定期检查连接的健康状态,清理失效连接。
        d.超时和重试
            实现连接超时机制和自动重试策略。

05.资源池性能优化策略
    a.容量优化
        a.最佳容量确定
            通过性能测试确定最优的池容量大小。
        b.动态调整机制
            根据系统负载动态调整池的容量。
        c.预热策略
            系统启动时预创建资源,减少首次访问延迟。
    b.资源分配优化
        a.快速路径设计
            为资源充足情况设计快速获取路径。
        b.批量操作支持
            支持批量获取和释放资源,减少信号量操作频率。
        c.本地缓存机制
            为线程提供本地资源缓存,减少竞争。
    c.监控和调优
        a.性能指标收集
            收集资源利用率、等待时间、吞吐量等指标。
        b.瓶颈识别
            通过分析监控数据识别性能瓶颈。
        c.自适应优化
            基于监控数据自动调整池参数。

06.高级特性与扩展
    a.分层资源池
        a.多级缓存设计
            设计多层次的资源缓存结构,提高命中率。
        b.资源分级
            根据资源特性和使用频率进行分级管理。
        c.智能调度
            基于资源使用模式智能调度资源分配。
    b.分布式资源池
        a.跨节点协调
            在分布式环境中协调资源使用和分配。
        b.一致性保证
            确保分布式环境下的资源状态一致性。
        c.故障转移
            实现节点故障时的资源自动转移机制。
    c.智能预测与预分配
        a.使用模式学习
            学习资源使用模式,预测未来需求。
        b.预测性分配
            基于预测结果预分配资源,减少等待时间。
        c.自动扩缩容
            根据预测和实际负载自动调整池容量。

07.实现考虑与最佳实践
    a.线程安全设计
        a.无锁数据结构
            在可能的情况下使用无锁数据结构提高性能。
        b.细粒度锁
            使用细粒度锁减少线程竞争。
        c.原子操作
            对于简单状态更新使用原子操作。
    b.资源有效性管理
        a.健康检查机制
            定期检查资源的有效性,清理损坏资源。
        b.资源重建策略
            当资源失效时的重建或替换策略。
        c.优雅降级
            在资源不足时的降级处理机制。
    c.监控和诊断
        a.详细日志记录
            记录资源获取、释放、异常等详细信息。
        b.性能监控接口
            提供丰富的性能监控接口和指标。
        c.故障诊断工具
            提供故障诊断和问题排查的工具。
    d.配置管理
        a.灵活配置系统
            提供灵活的配置系统,支持运行时参数调整。
        b.配置验证
            验证配置参数的合理性,防止错误配置。
        c.配置热更新
            支持配置的热更新,无需重启服务。

08.故障处理与恢复
    a.资源泄漏检测
        a.使用跟踪机制
            跟踪资源的分配和使用情况。
        b.泄漏检测算法
            实现资源泄漏的自动检测算法。
        c.自动清理机制
            检测到泄漏时的自动清理和恢复机制。
    b.异常恢复策略
        a.故障隔离
            将故障资源隔离,防止影响整个池的运行。
        b.渐进式恢复
            采用渐进式策略恢复故障,避免系统过载。
        c.备用资源机制
            维护备用资源,在主资源故障时切换使用。
    c.数据一致性保证
        a.事务性操作
            确保资源操作的原子性和一致性。
        b.状态检查点
            定期保存状态检查点,便于故障恢复。
        c.回滚机制
            在操作失败时的状态回滚机制。

4.4 限流控制

01.限流基础概念
    a.定义与目的
        限流是一种通过控制请求速率来保护系统资源的技术手段。
        通过限制并发请求数量或请求频率,防止系统过载和服务降级。
    b.核心目标
        a.系统保护
            防止过多请求导致系统资源耗尽或性能急剧下降。
        b.服务质量保证
            确保为合法用户提供稳定可靠的服务质量。
        c.资源公平分配
            在多用户环境下公平分配有限的系统资源。
    c.信号量在限流中的作用
        a.并发控制
            使用信号量限制同时处理的请求数量。
        b.速率调节
            通过信号量的获取和释放控制请求处理速率。
        c.队列管理
            超出限制的请求进入等待队列或被拒绝。

02.限流算法类型
    a.固定窗口限流
        a.算法原理
            在固定时间窗口内限制最大请求数量,窗口到期后重置计数。
        b.实现特点
            实现简单,计数器重置清晰,但存在边界突刺问题。
        c.适用场景
            适用于对精度要求不高、实现简单的限流场景。
    b.滑动窗口限流
        a.算法原理
            使用滑动时间窗口统计请求数量,避免固定窗口的边界问题。
        b.实现复杂度
            需要维护请求时间戳列表,内存和计算开销较大。
        c.精度优势
            提供更精确的流量控制,避免突发流量冲击。
    c.令牌桶算法
        a.算法原理
            以固定速率向桶中添加令牌,请求需要获取令牌才能通过。
        b.突发处理
            允许一定程度的突发流量,桶容量决定最大突发量。
        c.平滑限流
            提供平滑的流量限制,避免请求速率剧烈波动。
    d.漏桶算法
        a.算法原理
            请求进入漏桶,以固定速率流出,超出容量的请求被丢弃。
        b.输出平滑
            确保输出流量恒定,平滑处理突发输入。
        c.简单可靠
            实现简单,适用于输出速率要求严格的场景。

03.基于信号量的限流实现
    a.并发数限制
        a.信号量配置
            设置信号量初始值为最大并发请求数。
        b.获取机制
            每个请求处理前先获取信号量许可。
        c.阻塞处理
            超出限制的请求阻塞等待或直接拒绝。
    b.请求队列管理
        a.有界队列
            使用有界队列存储等待的请求,控制内存使用。
        b.超时机制
            为等待请求设置超时时间,避免无限等待。
        c.拒绝策略
            队列满时的新请求处理策略(拒绝、降级或重试)。
    c.动态调整机制
        a.实时监控
            监控系统负载和响应时间,动态调整限流参数。
        b.自适应算法
            根据系统状态自动调整信号量计数器值。
        c.平滑切换
            参数调整时的平滑过渡,避免流量突变。

04.多层级限流架构
    a.接入层限流
        a.全局流量控制
            在接入网关或负载均衡器实施全局流量控制。
        b.IP级别限流
            基于客户端IP地址实施访问频率限制。
        c.用户级别限流
            基于用户身份实施个性化限流策略。
    b.应用层限流
        a.API接口限流
            对特定API接口实施独立的限流控制。
        b.服务级别限流
            基于微服务架构的服务级别限流。
        c.功能模块限流
            对关键功能模块实施细粒度限流。
    c.资源层限流
        a.数据库连接限流
            限制并发数据库连接数,保护数据库资源。
        b.缓存访问限流
            限制缓存服务的并发访问数。
        c.外部服务调用限流
            限制对第三方服务的调用频率。

05.限流策略配置
    a.阈值设定原则
        a.系统容量评估
            基于系统性能测试结果设定合理阈值。
        b.安全余量考虑
            预留安全余量,应对突发情况。
        c.动态调整空间
            为后续优化调整预留空间。
    b.限流粒度选择
        a.粗粒度限流
            基于整体系统的简单限流控制。
        b.细粒度限流
            基于具体接口、用户或功能的精确限流。
        c.混合策略
            结合粗细粒度的分层限流策略。
    c.异常处理机制
        a.限流触发处理
            定义触发限流时的标准处理流程。
        b.降级策略
            设计服务降级方案,保证核心功能可用。
        c.监控告警
            设置限流触发监控和告警机制。

06.监控与调优
    a.关键指标监控
        a.请求量统计
            监控总请求数、成功请求数、限流请求数等。
        b.响应时间分布
            监控不同请求类型的响应时间分布。
        c.系统资源使用率
            监控CPU、内存、网络等资源使用情况。
    b.性能分析
        a.瓶颈识别
            通过性能分析识别系统瓶颈和优化点。
        b.限流效果评估
            评估限流策略对系统性能和用户体验的影响。
        c.参数优化
            基于监控数据优化限流参数配置。
    c.自适应优化
        a.机器学习应用
            使用机器学习算法优化限流策略。
        b.实时调整
            基于实时监控数据动态调整限流参数。
        c.预测性限流
            基于历史数据预测流量趋势,提前调整限流策略。

07.分布式环境下的限流
    a.集中式限流
        a.中央控制器
            使用中央限流服务统一管理限流规则。
        b.网络开销
            考虑网络延迟对限流精度的影响。
        c.单点故障
            防止中央限流服务成为单点故障。
    b.分布式限流
        a.本地限流
            各服务节点独立实施限流,减少网络依赖。
        b.全局协调
            通过分布式协调服务保证全局一致性。
        c.最终一致性
            接受限流状态的最终一致性模型。
    c.混合限流模式
        a.分层限流
            结合集中式和分布式限流的优势。
        b.故障转移
            限流服务故障时的自动转移机制。
        c.负载均衡
            在多个限流服务间实现负载均衡。

08.最佳实践与常见问题
    a.设计原则
        a.渐进式实施
            从简单到复杂,逐步完善限流策略。
        b.可观测性
            确保限流行为的可观测性和可调试性。
        c.用户体验保护
            在保护系统的同时尽量减少对用户体验的影响。
    b.实现建议
        a.分层实施
            在不同层次实施适当的限流策略。
        b.配置外化
            将限流配置外化,支持动态调整。
        c.降级预案
            制定完善的降级和应急预案。
    c.常见问题解决方案
        a.限流精度问题
            使用滑动窗口等算法提高限流精度。
        b.性能影响
            优化限流算法实现,减少性能开销。
        c.配置复杂性
            提供简化配置接口和默认配置模板。

5. 事件对象

5.1 threading.Event

01.Event基础概念
    a.定义与作用
        Event是Python threading模块提供的事件同步原语,用于线程间的简单通信机制。
        它通过内部标志位实现线程的等待和通知,支持一个或多个线程等待某个事件的发生。
    b.核心特性
        a.标志位机制
            内部维护一个布尔标志位,初始状态为False,表示事件未设置。
        b.状态持久化
            一旦事件被设置,状态会保持直到被显式清除。
        c.多等待者支持
            可以有多个线程同时等待同一个事件。
    c.应用场景
        a.线程协调
            协调多个线程的执行顺序和时机。
        b.状态通知
            通知线程某个条件已满足或某个操作已完成。
        c.初始化等待
            等待系统或组件初始化完成后再开始工作。

02.Event对象创建与基本用法
    a.构造方法
        threading.Event()
        创建一个新的事件对象,初始状态为未设置(False)。
    b.核心方法概述
        a.set()方法
            将事件标志位设置为True,唤醒所有等待的线程。
        b.clear()方法
            将事件标志位重置为False,使后续的wait()操作阻塞。
        c.wait()方法
            阻塞当前线程直到事件标志位变为True。
        d.is_set()方法
            返回当前事件的标志位状态(True或False)。
    c.基本使用模式
        a.设置-等待模式
            一个线程设置事件,其他线程等待事件发生。
        b.重置-等待模式
            清除事件状态,等待下一次事件设置。
        c.查询模式
            检查事件状态但不阻塞线程执行。

03.Event核心方法详解
    a.set()方法
        a.功能说明
            设置事件标志位为True,并唤醒所有正在等待的线程。
        b.唤醒机制
            使用条件变量唤醒所有wait()中阻塞的线程。
        c.幂等性
            多次调用set()是安全的,不会产生副作用。
        d.代码示例
            ---
            # Event.set()方法示例
            import threading
            import time
            import logging

            # 配置日志
            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class TaskManager:
                def __init__(self):
                    self.start_event = threading.Event()
                    self.workers = []
                    self.completed_tasks = 0

                def start_workers(self, num_workers=3):
                    """启动工作线程"""
                    for i in range(num_workers):
                        worker = threading.Thread(
                            target=self.worker_task,
                            name=f"Worker-{i+1}",
                            args=(i+1,)
                        )
                        self.workers.append(worker)
                        worker.start()
                        logger.info(f"启动工作线程: {worker.name}")

                def worker_task(self, worker_id):
                    """工作线程任务"""
                    logger.info(f"{worker_id}: 等待开始信号...")

                    # 等待开始事件
                    self.start_event.wait()

                    logger.info(f"{worker_id}: 收到开始信号,开始执行任务")

                    # 模拟工作任务
                    for i in range(5):
                        time.sleep(0.5)
                        logger.info(f"{worker_id}: 执行任务步骤 {i+1}/5")

                    # 任务完成
                    with threading.Lock():
                        self.completed_tasks += 1
                        logger.info(f"{worker_id}: 任务完成!已完成任务数: {self.completed_tasks}")

                def start_all_tasks(self):
                    """开始所有任务"""
                    logger.info("主线程准备开始所有任务...")
                    time.sleep(1)  # 给工作线程一些启动时间

                    # 设置事件,唤醒所有等待的线程
                    logger.info("主线程发送开始信号!")
                    self.start_event.set()

                    # 等待所有任务完成
                    for worker in self.workers:
                        worker.join()

                    logger.info(f"所有任务完成!总完成任务数: {self.completed_tasks}")

            # 使用示例
            if __name__ == "__main__":
                task_manager = TaskManager()
                task_manager.start_workers(num_workers=3)
                task_manager.start_all_tasks()
            ---
    b.clear()方法
        a.功能说明
            重置事件标志位为False,使后续的wait()调用阻塞。
        b.状态重置
            清除之前set()方法设置的状态。
        c.重复使用
            支持事件的重复设置和清除,实现可重复使用的同步机制。
        d.代码示例
            ---
            # Event.clear()方法示例 - 可重复使用的事件
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class ReusableEventSystem:
                def __init__(self):
                    self.round_event = threading.Event()
                    self.current_round = 0
                    self.workers = []

                def start_round(self, round_num, num_workers=3):
                    """开始新一轮工作"""
                    self.current_round = round_num

                    # 清除之前的事件状态
                    self.round_event.clear()
                    logger.info(f"=== 第{round_num}轮工作准备开始 ===")

                    # 创建并启动工作线程
                    self.workers = []
                    for i in range(num_workers):
                        worker = threading.Thread(
                            target=self.worker_in_round,
                            name=f"Round{round_num}-Worker{i+1}",
                            args=(i+1,)
                        )
                        self.workers.append(worker)
                        worker.start()

                    # 等待一会让线程准备好
                    time.sleep(0.5)

                    # 触发事件开始工作
                    logger.info(f"第{round_num}轮工作开始!")
                    self.round_event.set()

                    # 等待所有工作线程完成
                    for worker in self.workers:
                        worker.join()

                    logger.info(f"=== 第{round_num}轮工作完成 ===")

                def worker_in_round(self, worker_id):
                    """在特定轮次中工作"""
                    logger.info(f"第{self.current_round}轮-Worker{worker_id}: 准备就绪,等待开始信号")

                    # 等待本轮次的事件
                    self.round_event.wait()

                    logger.info(f"第{self.current_round}轮-Worker{worker_id}: 开始工作")

                    # 执行工作任务
                    for step in range(3):
                        time.sleep(0.3)
                        logger.info(f"第{self.current_round}轮-Worker{worker_id}: 工作步骤 {step+1}/3")

                    logger.info(f"第{self.current_round}轮-Worker{worker_id}: 工作完成")

            # 使用示例
            if __name__ == "__main__":
                event_system = ReusableEventSystem()

                # 执行3轮工作
                for round_num in range(1, 4):
                    event_system.start_round(round_num, num_workers=2)
                    time.sleep(0.5)  # 轮次间间隔
            ---
    c.wait()方法
        a.功能说明
            阻塞当前线程直到事件标志位变为True。
        b.阻塞特性
            如果事件已经设置,立即返回;否则阻塞直到事件被设置。
        c.超时支持
            支持timeout参数,避免无限期等待。
        d.返回值
            当事件被设置时返回True,超时时返回False。
        e.代码示例
            ---
            # Event.wait()方法示例 - 带超时的等待
            import threading
            import time
            import random
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class EventWaitDemo:
                def __init__(self):
                    self.data_ready = threading.Event()
                    self.processors = []
                    self.data_producer = None

                def start_data_processing(self):
                    """启动数据处理流程"""
                    # 启动数据生产者
                    self.data_producer = threading.Thread(target=self.produce_data)
                    self.data_producer.start()

                    # 启动多个数据处理器
                    for i in range(3):
                        processor = threading.Thread(
                            target=self.consume_data,
                            name=f"Processor-{i+1}",
                            args=(f"数据集{i+1}",)
                        )
                        self.processors.append(processor)
                        processor.start()

                    # 等待所有线程完成
                    self.data_producer.join()
                    for processor in self.processors:
                        processor.join()

                    logger.info("所有数据处理流程完成")

                def produce_data(self):
                    """数据生产者"""
                    logger.info("数据生产者开始工作...")

                    for batch in range(3):
                        # 模拟数据生产时间
                        production_time = random.uniform(1.0, 3.0)
                        logger.info(f"正在生产数据批次 {batch+1},预计需要 {production_time:.2f} 秒")
                        time.sleep(production_time)

                        # 数据生产完成,设置事件
                        logger.info(f"数据批次 {batch+1} 生产完成!")
                        self.data_ready.set()

                        # 等待消费者处理完数据
                        time.sleep(0.5)

                        # 清除事件,为下一批数据做准备
                        self.data_ready.clear()
                        logger.info(f"准备生产下一批数据...")
                        time.sleep(0.5)

                    logger.info("数据生产者工作完成")

                def consume_data(self, dataset_name):
                    """数据消费者"""
                    logger.info(f"{dataset_name} 消费者启动,等待数据...")

                    batch_count = 0
                    while batch_count < 3:  # 最多处理3批数据
                        # 等待数据准备就绪,设置2秒超时
                        logger.info(f"{dataset_name}: 等待数据准备就绪...")
                        if self.data_ready.wait(timeout=2.0):
                            logger.info(f"{dataset_name}: 数据准备就绪,开始处理")

                            # 模拟数据处理
                            processing_time = random.uniform(0.5, 1.5)
                            time.sleep(processing_time)
                            batch_count += 1

                            logger.info(f"{dataset_name}: 数据处理完成(批次{batch_count})")
                        else:
                            logger.warning(f"{dataset_name}: 等待数据超时,停止处理")
                            break

                    logger.info(f"{dataset_name} 消费者结束,共处理 {batch_count} 批数据")

            # 使用示例
            if __name__ == "__main__":
                demo = EventWaitDemo()
                demo.start_data_processing()
            ---
    d.is_set()方法
        a.功能说明
            返回当前事件的标志位状态,不阻塞线程执行。
        b.非阻塞特性
            立即返回True或False,不会等待事件状态改变。
        c.状态查询
            用于检查事件是否已被设置,但不改变事件状态。
        d.代码示例
            ---
            # Event.is_set()方法示例 - 状态检查
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class EventStatusMonitor:
                def __init__(self):
                    self.work_complete = threading.Event()
                    self.monitor_thread = None

                def start_monitoring(self):
                    """启动状态监控"""
                    self.monitor_thread = threading.Thread(target=self.monitor_status)
                    self.monitor_thread.start()
                    logger.info("状态监控线程已启动")

                def monitor_status(self):
                    """监控事件状态"""
                    logger.info("开始监控工作完成状态...")

                    while not self.work_complete.is_set():
                        if self.work_complete.is_set():
                            logger.info("检测到工作已完成!")
                        else:
                            logger.info("工作尚未完成,继续监控...")

                        time.sleep(1)  # 每秒检查一次

                    logger.info("监控结束:工作已完成")

                def do_work(self, duration=5):
                    """模拟工作过程"""
                    logger.info(f"开始执行工作,预计 {duration} 秒完成")

                    # 在工作过程中定期检查状态
                    for i in range(duration):
                        time.sleep(1)
                        logger.info(f"工作进度: {i+1}/{duration}")

                        # 检查是否有人提前设置了完成事件
                        if self.work_complete.is_set():
                            logger.info("检测到工作被标记为完成")
                            break

                    # 标记工作完成
                    logger.info("工作执行完成")
                    self.work_complete.set()

                def force_complete(self):
                    """强制标记工作完成"""
                    logger.info("强制标记工作为完成状态")
                    self.work_complete.set()

                def stop_monitoring(self):
                    """停止监控"""
                    if self.monitor_thread and self.monitor_thread.is_alive():
                        self.monitor_thread.join(timeout=2)
                        logger.info("状态监控线程已停止")

            # 使用示例
            if __name__ == "__main__":
                monitor = EventStatusMonitor()
                monitor.start_monitoring()

                # 启动工作线程
                work_thread = threading.Thread(
                    target=monitor.do_work,
                    args=(4,)  # 4秒的工作
                )
                work_thread.start()

                # 等待工作完成
                work_thread.join()
                monitor.stop_monitoring()

                print("\n" + "="*50)
                print("演示强制完成功能")
                print("="*50)

                # 再次演示强制完成功能
                monitor2 = EventStatusMonitor()
                monitor2.start_monitoring()

                # 启动一个较长的任务
                long_work_thread = threading.Thread(
                    target=monitor2.do_work,
                    args=(10,)  # 10秒的工作
                )
                long_work_thread.start()

                # 3秒后强制完成
                time.sleep(3)
                monitor2.force_complete()

                long_work_thread.join()
                monitor2.stop_monitoring()
            ---

04.Event的高级用法
    a.多线程协调模式
        a.主从模式
            主线程等待所有工作线程完成某个阶段。
        b.生产者消费者模式
            使用事件通知消费者数据准备就绪。
        c.屏障同步模式
            多个线程等待特定条件后同时继续执行。
        d.代码示例
            ---
            # Event高级用法 - 多线程协调
            import threading
            import time
            import logging
            import random

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class PipelineSystem:
                def __init__(self):
                    # 阶段事件
                    self.stage1_complete = threading.Event()
                    self.stage2_complete = threading.Event()
                    self.stage3_complete = threading.Event()

                    # 工作线程
                    self.threads = []
                    self.results = {}

                def run_pipeline(self):
                    """运行完整的三阶段管道"""
                    logger.info("=== 开始三阶段处理管道 ===")

                    # 启动工作线程
                    self._start_workers()

                    # 启动阶段协调器
                    coordinator = threading.Thread(target=self.coordinate_stages)
                    coordinator.start()

                    # 等待所有工作完成
                    coordinator.join()
                    for thread in self.threads:
                        thread.join()

                    logger.info("=== 处理管道完成 ===")
                    self._print_results()

                def _start_workers(self):
                    """启动各种工作线程"""
                    # 阶段1:数据收集线程
                    for i in range(2):
                        thread = threading.Thread(
                            target=self.data_collector,
                            name=f"Collector-{i+1}",
                            args=(f"数据源{i+1}",)
                        )
                        self.threads.append(thread)
                        thread.start()

                    # 阶段2:数据处理线程
                    for i in range(3):
                        thread = threading.Thread(
                            target=self.data_processor,
                            name=f"Processor-{i+1}",
                            args=(f"处理器{i+1}",)
                        )
                        self.threads.append(thread)
                        thread.start()

                    # 阶段3:结果汇总线程
                    for i in range(2):
                        thread = threading.Thread(
                            target=self.result_aggregator,
                            name=f"Aggregator-{i+1}",
                            args=(f"汇总器{i+1}",)
                        )
                        self.threads.append(thread)
                        thread.start()

                def data_collector(self, source_name):
                    """数据收集者"""
                    logger.info(f"{source_name}: 开始收集数据")

                    # 等待阶段1可以开始
                    self.stage1_complete.wait()

                    # 收集数据
                    data_items = []
                    for i in range(3):
                        time.sleep(random.uniform(0.2, 0.8))
                        item = f"{source_name}-数据项{i+1}"
                        data_items.append(item)
                        logger.info(f"{source_name}: 收集到 {item}")

                    # 存储结果
                    self.results[f"collector_{source_name}"] = data_items
                    logger.info(f"{source_name}: 数据收集完成")

                def data_processor(self, processor_name):
                    """数据处理器"""
                    logger.info(f"{processor_name}: 等待数据处理")

                    # 等待阶段1完成
                    self.stage1_complete.wait()
                    logger.info(f"{processor_name}: 阶段1完成,准备处理数据")

                    # 等待阶段2可以开始
                    self.stage2_complete.wait()

                    # 处理数据
                    processed_items = []
                    for i in range(4):
                        time.sleep(random.uniform(0.3, 0.7))
                        item = f"{processor_name}-处理结果{i+1}"
                        processed_items.append(item)
                        logger.info(f"{processor_name}: 处理完成 {item}")

                    # 存储结果
                    self.results[f"processor_{processor_name}"] = processed_items
                    logger.info(f"{processor_name}: 数据处理完成")

                def result_aggregator(self, aggregator_name):
                    """结果汇总者"""
                    logger.info(f"{aggregator_name}: 等待结果汇总")

                    # 等待阶段2完成
                    self.stage2_complete.wait()
                    logger.info(f"{aggregator_name}: 阶段2完成,准备汇总结果")

                    # 等待阶段3可以开始
                    self.stage3_complete.wait()

                    # 汇总结果
                    summary_items = []
                    for i in range(2):
                        time.sleep(random.uniform(0.4, 0.6))
                        item = f"{aggregator_name}-汇总项{i+1}"
                        summary_items.append(item)
                        logger.info(f"{aggregator_name}: 汇总完成 {item}")

                    # 存储结果
                    self.results[f"aggregator_{aggregator_name}"] = summary_items
                    logger.info(f"{aggregator_name}: 结果汇总完成")

                def coordinate_stages(self):
                    """协调各个阶段"""
                    # 阶段1:等待数据收集准备就绪
                    time.sleep(1)
                    logger.info("协调器:阶段1开始 - 数据收集")
                    self.stage1_complete.set()

                    # 等待数据收集完成
                    time.sleep(3)
                    logger.info("协调器:阶段1完成,开始阶段2 - 数据处理")
                    self.stage2_complete.set()

                    # 等待数据处理完成
                    time.sleep(4)
                    logger.info("协调器:阶段2完成,开始阶段3 - 结果汇总")
                    self.stage3_complete.set()

                    # 等待结果汇总完成
                    time.sleep(2)
                    logger.info("协调器:阶段3完成")

                def _print_results(self):
                    """打印处理结果"""
                    logger.info("=== 处理结果汇总 ===")
                    for key, value in self.results.items():
                        logger.info(f"{key}: {len(value)} 项")
                        for item in value:
                            logger.info(f"  - {item}")

            # 使用示例
            if __name__ == "__main__":
                pipeline = PipelineSystem()
                pipeline.run_pipeline()
            ---

05.Event的性能特点与注意事项
    a.性能特征
        a.内存开销
            Event对象内存占用很小,主要是内部标志位和等待队列。
        b.唤醒开销
            set()操作需要唤醒所有等待线程,可能产生较大的唤醒开销。
        c.上下文切换
            大量线程等待同一事件时,唤醒时会产生大量上下文切换。
    b.使用注意事项
        a.避免遗忘清除
            在需要重复使用事件时,记得清除事件状态。
        b.超时设置
            避免无限期等待,合理设置超时时间。
        c.事件泄漏
            注意不要导致事件一直处于设置状态,影响后续使用。
        d.代码示例
            ---
            # Event性能和注意事项示例
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class EventBestPractices:
                def __init__(self):
                    self.start_event = threading.Event()
                    self.stop_event = threading.Event()
                    self.workers = []
                    self.worker_count = 0

                def safe_worker_pattern(self, worker_id):
                    """安全的工作者模式"""
                    try:
                        logger.info(f"Worker-{worker_id}: 准备就绪")

                        # 使用超时等待,避免无限阻塞
                        while not self.start_event.wait(timeout=1.0):
                            logger.info(f"Worker-{worker_id}: 仍在等待开始信号...")
                            if self.stop_event.is_set():
                                logger.info(f"Worker-{worker_id}: 收到停止信号,退出")
                                return

                        logger.info(f"Worker-{worker_id}: 开始工作")

                        # 模拟工作过程,期间检查停止事件
                        for i in range(10):
                            if self.stop_event.is_set():
                                logger.info(f"Worker-{worker_id}: 工作中途收到停止信号,优雅退出")
                                return

                            # 执行工作
                            time.sleep(0.5)
                            logger.info(f"Worker-{worker_id}: 工作进度 {i+1}/10")

                        logger.info(f"Worker-{worker_id}: 工作完成")

                    except Exception as e:
                        logger.error(f"Worker-{worker_id}: 发生异常: {e}")
                    finally:
                        with threading.Lock():
                            self.worker_count -= 1

                def demonstrate_best_practices(self):
                    """演示最佳实践"""
                    logger.info("=== Event最佳实践演示 ===")

                    # 启动工作线程
                    num_workers = 3
                    self.worker_count = num_workers

                    for i in range(num_workers):
                        worker = threading.Thread(
                            target=self.safe_worker_pattern,
                            args=(i+1,)
                        )
                        self.workers.append(worker)
                        worker.start()

                    # 等待所有线程准备就绪
                    time.sleep(2)

                    # 开始工作
                    logger.info("主线程:发送开始信号")
                    self.start_event.set()

                    # 工作一段时间后停止
                    time.sleep(3)
                    logger.info("主线程:发送停止信号")
                    self.stop_event.set()

                    # 等待所有线程完成
                    for worker in self.workers:
                        worker.join(timeout=2)
                        if worker.is_alive():
                            logger.warning(f"线程 {worker.name} 未能在超时内完成")

                    logger.info(f"所有线程已完成,剩余工作线程数: {self.worker_count}")

                def reset_and_reuse(self):
                    """重置并重用事件"""
                    logger.info("=== 事件重置和重用演示 ===")

                    # 清除事件状态
                    self.start_event.clear()
                    self.stop_event.clear()

                    # 重置工作线程列表
                    self.workers = []

                    logger.info("事件状态已重置,可以重新使用")

            # 使用示例
            if __name__ == "__main__":
                practices = EventBestPractices()
                practices.demonstrate_best_practices()

                time.sleep(1)
                practices.reset_and_reuse()
            ---

5.2 set、clear、wait方法

01.Event方法深度解析
    a.方法概述
        Event对象提供四个核心方法:set()、clear()、wait()、is_set()。
        这些方法协同工作,实现线程间的同步和通信机制。
        set()和clear()方法控制事件状态,wait()方法等待状态变化。
    b.方法间的关系
        a.状态控制
            set()将状态设置为True,clear()将状态重置为False。
        b.状态查询
            is_set()查询当前状态,wait()等待状态变为True。
        c.协同使用
            通常一个线程调用set(),其他线程调用wait()等待。
    c.生命周期模式
        a.一次性事件
            设置后不再清除,用于通知一次性完成状态。
        b.重复使用事件
            设置-清除循环,用于多轮次的同步控制。

02.set()方法详解
    a.方法签名与参数
        def set(self) -> None
        无参数,返回值为None,用于设置事件标志位。
    b.核心功能
        a.状态设置
            将内部标志位设置为True,表示事件已发生。
        b.线程唤醒
            唤醒所有正在wait()中等待的线程。
        c.幂等操作
            多次调用set()不会产生副作用,状态保持True。
    c.内部实现原理
        a.标志位更新
            原子性地更新内部布尔标志位。
        b.等待队列管理
            维护等待线程的队列,唤醒时遍历队列。
        c.通知机制
            使用条件变量或操作系统原语实现线程通知。
    d.代码示例
        ---
        # Event.set()方法深度示例 - 多阶段通知系统
        import threading
        import time
        import logging
        import queue
        from enum import Enum

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class SystemPhase(Enum):
            """系统阶段枚举"""
            INITIALIZING = "初始化中"
            READY = "就绪"
            PROCESSING = "处理中"
            COMPLETING = "完成中"
            COMPLETED = "已完成"
            ERROR = "错误"

        class PhaseManager:
            """阶段管理器 - 演示set()方法的多种使用模式"""
            def __init__(self):
                # 各阶段的事件对象
                self.events = {
                    SystemPhase.INITIALIZING: threading.Event(),
                    SystemPhase.READY: threading.Event(),
                    SystemPhase.PROCESSING: threading.Event(),
                    SystemPhase.COMPLETING: threading.Event(),
                    SystemPhase.COMPLETED: threading.Event(),
                    SystemPhase.ERROR: threading.Event()
                }

                # 当前阶段
                self.current_phase = None
                self.phase_history = []

                # 工作线程池
                self.workers = []
                self.worker_results = queue.Queue()

            def initialize_system(self):
                """系统初始化阶段"""
                self.current_phase = SystemPhase.INITIALIZING
                logger.info(f"=== {self.current_phase.value} ===")

                # 设置初始化事件,通知系统开始初始化
                self.events[SystemPhase.INITIALIZING].set()
                logger.info("初始化事件已设置,通知各组件开始初始化")

                # 模拟初始化过程
                time.sleep(1)

                # 准备就绪
                self.transition_to_phase(SystemPhase.READY)

            def transition_to_phase(self, new_phase):
                """转换到新阶段"""
                old_phase = self.current_phase
                self.current_phase = new_phase
                self.phase_history.append((old_phase, new_phase, time.time()))

                logger.info(f"阶段转换: {old_phase.value if old_phase else 'None'} -> {new_phase.value}")

                # 设置新阶段事件
                self.events[new_phase].set()
                logger.info(f"阶段事件已设置: {new_phase.value}")

                # 如果不是错误阶段,清除之前的非错误阶段事件
                if new_phase != SystemPhase.ERROR:
                    self._clear_previous_events(new_phase)

            def _clear_previous_events(self, current_phase):
                """清除之前的事件状态"""
                for phase, event in self.events.items():
                    if phase != current_phase and phase != SystemPhase.ERROR:
                        if event.is_set():
                            event.clear()
                            logger.info(f"清除事件: {phase.value}")

            def start_workers(self, num_workers=3):
                """启动工作线程"""
                for i in range(num_workers):
                    worker = threading.Thread(
                        target=self.worker_process,
                        name=f"Worker-{i+1}",
                        args=(i+1,)
                    )
                    self.workers.append(worker)
                    worker.start()
                    logger.info(f"启动工作线程: {worker.name}")

            def worker_process(self, worker_id):
                """工作线程处理逻辑"""
                logger.info(f"{worker_id}: 启动,等待系统就绪")

                # 等待系统就绪
                self.events[SystemPhase.READY].wait()
                logger.info(f"{worker_id}: 系统就绪,开始工作")

                # 等待处理阶段
                self.events[SystemPhase.PROCESSING].wait()
                logger.info(f"{worker_id}: 开始处理任务")

                # 模拟工作处理
                processing_time = 2
                for i in range(processing_time):
                    time.sleep(0.5)
                    logger.info(f"{worker_id}: 处理进度 {i+1}/{processing_time}")

                    # 检查是否有错误事件
                    if self.events[SystemPhase.ERROR].is_set():
                        logger.warning(f"{worker_id}: 检测到错误事件,停止处理")
                        return

                # 工作完成
                result = f"Worker-{worker_id}-Result"
                self.worker_results.put(result)
                logger.info(f"{worker_id}: 工作完成,结果: {result}")

                # 等待完成阶段
                self.events[SystemPhase.COMPLETING].wait()
                logger.info(f"{worker_id}: 系统完成阶段")

                # 等待最终完成
                self.events[SystemPhase.COMPLETED].wait()
                logger.info(f"{worker_id}: 系统最终完成")

            def process_with_error_simulation(self, simulate_error=False):
                """模拟处理过程,可选择是否模拟错误"""
                logger.info("开始处理阶段")

                # 转换到处理阶段
                self.transition_to_phase(SystemPhase.PROCESSING)

                # 等待一些工作完成
                time.sleep(3)

                if simulate_error:
                    # 模拟错误情况
                    logger.warning("检测到错误情况!")
                    self.transition_to_phase(SystemPhase.ERROR)
                    return

                # 转换到完成阶段
                self.transition_to_phase(SystemPhase.COMPLETING)
                time.sleep(1)

                # 最终完成
                self.transition_to_phase(SystemPhase.COMPLETED)

            def shutdown_system(self):
                """关闭系统"""
                logger.info("开始关闭系统")

                # 等待所有工作线程完成
                for worker in self.workers:
                    worker.join(timeout=2)
                    if worker.is_alive():
                        logger.warning(f"工作线程 {worker.name} 未能在超时内完成")

                # 收集结果
                results = []
                while not self.worker_results.empty():
                    try:
                        result = self.worker_results.get_nowait()
                        results.append(result)
                    except queue.Empty:
                        break

                logger.info(f"系统关闭完成,收集到 {len(results)} 个结果")
                for result in results:
                    logger.info(f"  - {result}")

        # 使用示例
        if __name__ == "__main__":
            print("="*60)
            print("演示1: 正常处理流程")
            print("="*60)

            manager = PhaseManager()
            manager.start_workers(num_workers=2)

            # 初始化系统
            manager.initialize_system()

            # 正常处理
            manager.process_with_error_simulation(simulate_error=False)

            # 关闭系统
            manager.shutdown_system()

            print("\n" + "="*60)
            print("演示2: 错误处理流程")
            print("="*60)

            manager2 = PhaseManager()
            manager2.start_workers(num_workers=2)

            # 初始化系统
            manager2.initialize_system()

            # 模拟错误处理
            manager2.process_with_error_simulation(simulate_error=True)

            # 关闭系统
            manager2.shutdown_system()
        ---

03.clear()方法详解
    a.方法签名与参数
        def clear(self) -> None
        无参数,返回值为None,用于清除事件状态。
    b.核心功能
        a.状态重置
            将内部标志位重置为False,表示事件未发生。
        b.后续阻塞
            使后续的wait()调用进入阻塞状态。
        c.重复使用
            支持事件的重复设置和清除循环。
    c.使用场景
        a.多轮次同步
            在多轮次的处理中使用同一个事件对象。
        b.状态重置
            在新的一轮开始前清除之前的状态。
        c.循环控制
            控制循环中的线程同步点。
    d.代码示例
        ---
        # Event.clear()方法深度示例 - 循环任务调度器
        import threading
        import time
        import logging
        import random
        from collections import defaultdict

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class RoundRobinScheduler:
            """循环任务调度器 - 演示clear()方法的重复使用模式"""
            def __init__(self, num_workers=3):
                self.num_workers = num_workers
                self.round_event = threading.Event()
                self.round_complete_event = threading.Event()

                # 工作线程
                self.workers = []
                self.worker_status = {}
                self.round_results = defaultdict(list)

                # 调度控制
                self.current_round = 0
                self.max_rounds = 5
                self.running = True

            def start_scheduler(self):
                """启动调度器"""
                logger.info("=== 启动循环任务调度器 ===")

                # 创建并启动工作线程
                for i in range(self.num_workers):
                    worker_name = f"Worker-{i+1}"
                    self.worker_status[worker_name] = {
                        'round': 0,
                        'completed': 0,
                        'status': 'idle'
                    }

                    worker = threading.Thread(
                        target=self.worker_loop,
                        name=worker_name,
                        args=(i+1,)
                    )
                    self.workers.append(worker)
                    worker.start()

                # 启动调度线程
                scheduler_thread = threading.Thread(target=self.schedule_rounds)
                scheduler_thread.start()

                # 启动监控线程
                monitor_thread = threading.Thread(target=self.monitor_status)
                monitor_thread.start()

                return scheduler_thread, monitor_thread

            def schedule_rounds(self):
                """调度多轮任务"""
                logger.info("调度器开始工作")

                for round_num in range(1, self.max_rounds + 1):
                    if not self.running:
                        break

                    self.current_round = round_num
                    logger.info(f"\n=== 第 {round_num} 轮任务开始 ===")

                    # 清除之前的事件状态
                    self.round_event.clear()
                    self.round_complete_event.clear()
                    logger.info(f"事件状态已清除,准备第 {round_num} 轮")

                    # 设置本轮次事件,唤醒工作线程
                    time.sleep(0.5)  # 给监控线程一些时间
                    logger.info(f"发送第 {round_num} 轮开始信号")
                    self.round_event.set()

                    # 等待所有工作线程完成本轮任务
                    logger.info("等待工作线程完成本轮任务...")
                    self.round_complete_event.wait(timeout=10)

                    # 检查本轮完成情况
                    completed_count = self._count_completed_workers(round_num)
                    logger.info(f"第 {round_num} 轮完成,{completed_count}/{self.num_workers} 个工作线程完成")

                    # 轮次间休息
                    time.sleep(1)

                # 所有轮次完成
                logger.info("\n=== 所有轮次任务完成 ===")
                self.running = False

                # 最后设置事件确保工作线程退出
                self.round_event.set()
                time.sleep(1)

            def worker_loop(self, worker_id):
                """工作线程循环"""
                worker_name = threading.current_thread().name
                logger.info(f"{worker_name}: 启动,等待任务分配")

                while self.running:
                    # 等待本轮次开始事件
                    if self.round_event.wait(timeout=1):
                        if not self.running:
                            break

                        # 更新工作状态
                        self.worker_status[worker_name]['round'] = self.current_round
                        self.worker_status[worker_name]['status'] = 'working'

                        logger.info(f"{worker_name}: 开始第 {self.current_round} 轮工作")

                        # 执行工作任务
                        result = self._perform_round_work(worker_name, self.current_round)
                        self.round_results[self.current_round].append(result)

                        # 更新完成状态
                        self.worker_status[worker_name]['completed'] += 1
                        self.worker_status[worker_name]['status'] = 'completed'

                        logger.info(f"{worker_name}: 第 {self.current_round} 轮工作完成")

                        # 检查是否所有线程都完成了本轮工作
                        if self._check_round_completion(self.current_round):
                            logger.info(f"{worker_name}: 检测到第 {self.current_round} 轮全部完成")
                            self.round_complete_event.set()

                    else:
                        # 超时,检查是否应该退出
                        if not self.running:
                            break

                self.worker_status[worker_name]['status'] = 'stopped'
                logger.info(f"{worker_name}: 工作线程退出")

            def _perform_round_work(self, worker_name, round_num):
                """执行一轮工作"""
                # 模拟工作时间
                work_time = random.uniform(0.5, 2.0)
                time.sleep(work_time)

                # 生成工作结果
                result = {
                    'worker': worker_name,
                    'round': round_num,
                    'duration': work_time,
                    'timestamp': time.time(),
                    'status': 'completed'
                }

                return result

            def _count_completed_workers(self, round_num):
                """统计完成指定轮次的工人数"""
                completed_count = 0
                for status in self.worker_status.values():
                    if status['round'] == round_num and status['status'] == 'completed':
                        completed_count += 1
                return completed_count

            def _check_round_completion(self, round_num):
                """检查指定轮次是否所有工人都完成"""
                completed_count = self._count_completed_workers(round_num)
                return completed_count >= self.num_workers

            def monitor_status(self):
                """监控线程状态"""
                while self.running:
                    time.sleep(2)

                    # 状态不使用clear()时可能看到的是上轮的状态
                    if self.round_event.is_set():
                        logger.info(f"监控: 事件已设置 (当前轮次: {self.current_round})")
                    else:
                        logger.info("监控: 事件未设置 (等待下一轮)")

                    # 显示工人的当前状态
                    active_workers = sum(1 for s in self.worker_status.values() if s['status'] == 'working')
                    completed_workers = sum(1 for s in self.worker_status.values() if s['status'] == 'completed')

                    logger.info(f"监控: 活跃工作线程 {active_workers}, 已完成 {completed_workers}")

                logger.info("监控线程退出")

            def shutdown(self):
                """关闭调度器"""
                logger.info("关闭调度器...")
                self.running = False

                # 等待所有工作线程完成
                for worker in self.workers:
                    worker.join(timeout=2)
                    if worker.is_alive():
                        logger.warning(f"工作线程 {worker.name} 未能在超时内完成")

                # 打印统计信息
                self._print_statistics()

            def _print_statistics(self):
                """打印统计信息"""
                logger.info("\n=== 工作统计 ===")
                for worker_name, status in self.worker_status.items():
                    logger.info(f"{worker_name}: 完成 {status['completed']} 轮任务")

                logger.info("\n=== 各轮次结果 ===")
                for round_num, results in self.round_results.items():
                    logger.info(f"第 {round_num} 轮: {len(results)} 个结果")
                    for result in results:
                        logger.info(f"  - {result['worker']}: {result['duration']:.2f}s")

        # 使用示例
        if __name__ == "__main__":
            scheduler = RoundRobinScheduler(num_workers=3)
            scheduler_thread, monitor_thread = scheduler.start_scheduler()

            # 等待调度完成
            scheduler_thread.join()
            monitor_thread.join()

            # 关闭调度器
            scheduler.shutdown()
        ---

04.wait()方法详解
    a.方法签名与参数
        def wait(self, timeout: Optional[float] = None) -> bool
        timeout参数可选,指定等待超时时间(秒),返回值表示是否等到事件。
    b.核心功能
        a.状态检查
            首先检查当前事件状态,如果已设置则立即返回。
        b.阻塞等待
            如果事件未设置,将线程加入等待队列并阻塞。
        c.超时控制
            支持超时参数,避免无限期等待。
    c.返回值意义
        a.True表示事件被设置
            在等待期间事件被设置,或者在调用前事件已设置。
        b.False表示超时
            在指定的超时时间内事件仍未被设置。
    d.内部实现机制
        a.条件变量
            通常基于条件变量实现线程阻塞和唤醒。
        b.操作系统原语
            可能使用操作系统的等待/通知原语。
        c.线程队列管理
            维护等待线程的队列,支持FIFO或其他调度策略。
    e.代码示例
        ---
        # Event.wait()方法深度示例 - 多条件等待与超时处理
        import threading
        import time
        import logging
        import random
        from typing import Dict, List, Optional
        from dataclasses import dataclass
        from enum import Enum

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class ServiceStatus(Enum):
            """服务状态枚举"""
            STOPPED = "停止"
            STARTING = "启动中"
            RUNNING = "运行中"
            STOPPING = "停止中"
            ERROR = "错误"

        @dataclass
        class ServiceConfig:
            """服务配置"""
            name: str
            startup_time: float  # 启动所需时间(秒)
            failure_rate: float  # 故障率(0-1)
            max_wait_time: float  # 最大等待时间

        class ServiceManager:
            """服务管理器 - 演示wait()方法的多种等待模式"""
            def __init__(self):
                # 服务状态事件
                self.service_events: Dict[str, threading.Event] = {}
                self.service_status: Dict[str, ServiceStatus] = {}
                self.service_configs: Dict[str, ServiceConfig] = {}

                # 管理器状态
                self.shutdown_event = threading.Event()
                self.health_check_event = threading.Event()

                # 监控线程
                self.monitor_thread = None
                self.health_checker_thread = None

            def add_service(self, config: ServiceConfig):
                """添加服务"""
                service_name = config.name
                self.service_events[service_name] = threading.Event()
                self.service_status[service_name] = ServiceStatus.STOPPED
                self.service_configs[service_name] = config

                logger.info(f"添加服务: {service_name}")

            def start_service(self, service_name: str) -> bool:
                """启动指定服务"""
                if service_name not in self.service_configs:
                    logger.error(f"服务不存在: {service_name}")
                    return False

                config = self.service_configs[service_name]
                logger.info(f"启动服务: {service_name}")

                # 创建服务线程
                service_thread = threading.Thread(
                    target=self._service_worker,
                    name=f"Service-{service_name}",
                    args=(service_name,)
                )
                service_thread.start()

                # 等待服务启动完成,使用多种等待策略
                success = self._wait_for_service_start(service_name)
                return success

            def _wait_for_service_start(self, service_name: str) -> bool:
                """等待服务启动 - 演示不同的等待策略"""
                config = self.service_configs[service_name]
                start_event = self.service_events[service_name]

                logger.info(f"等待服务 {service_name} 启动...")

                # 策略1: 无限等待
                if config.max_wait_time <= 0:
                    logger.info(f"{service_name}: 使用无限等待模式")
                    start_event.wait()
                    logger.info(f"{service_name}: 启动成功")
                    return True

                # 策略2: 带超时的等待
                logger.info(f"{service_name}: 使用超时等待模式 (最大 {config.max_wait_time}s)")

                # 策略2a: 简单超时等待
                if start_event.wait(timeout=config.max_wait_time):
                    logger.info(f"{service_name}: 在超时前成功启动")
                    return True
                else:
                    logger.warning(f"{service_name}: 启动超时")
                    return False

            def _service_worker(self, service_name: str):
                """服务工作线程"""
                config = self.service_configs[service_name]
                start_event = self.service_events[service_name]

                try:
                    # 更新状态为启动中
                    self.service_status[service_name] = ServiceStatus.STARTING
                    logger.info(f"{service_name}: 进入启动状态")

                    # 模拟启动过程
                    startup_steps = int(config.startup_time * 2)  # 每0.5秒一个步骤
                    for step in range(startup_steps):
                        if self.shutdown_event.is_set():
                            logger.info(f"{service_name}: 收到关闭信号,停止启动")
                            return

                        time.sleep(0.5)
                        progress = (step + 1) / startup_steps * 100
                        logger.info(f"{service_name}: 启动进度 {progress:.1f}%")

                        # 模拟启动故障
                        if random.random() < config.failure_rate / startup_steps:
                            logger.error(f"{service_name}: 启动过程中发生故障")
                            self.service_status[service_name] = ServiceStatus.ERROR
                            return

                    # 启动成功
                    self.service_status[service_name] = ServiceStatus.RUNNING
                    logger.info(f"{service_name}: 启动完成,进入运行状态")
                    start_event.set()

                    # 模拟服务运行
                    self._run_service_loop(service_name)

                except Exception as e:
                    logger.error(f"{service_name}: 发生异常: {e}")
                    self.service_status[service_name] = ServiceStatus.ERROR
                    start_event.set()  # 即使失败也要设置事件,避免无限等待

            def _run_service_loop(self, service_name: str):
                """服务运行循环"""
                try:
                    while not self.shutdown_event.is_set():
                        # 等待健康检查信号
                        self.health_check_event.wait(timeout=5)

                        if self.shutdown_event.is_set():
                            break

                        # 模拟服务运行
                        time.sleep(1)

                        # 模拟运行时故障
                        config = self.service_configs[service_name]
                        if random.random() < config.failure_rate / 100:  # 降低运行时故障率
                            logger.error(f"{service_name}: 运行时发生故障")
                            self.service_status[service_name] = ServiceStatus.ERROR
                            break

                finally:
                    # 服务停止
                    self.service_status[service_name] = ServiceStatus.STOPPED
                    logger.info(f"{service_name}: 服务已停止")

            def wait_for_multiple_services(self, service_names: List[str], timeout: float = 30) -> Dict[str, bool]:
                """等待多个服务启动 - 演示并发等待"""
                logger.info(f"等待多个服务启动: {service_names}")

                results = {}
                start_time = time.time()

                for service_name in service_names:
                    if service_name not in self.service_events:
                        results[service_name] = False
                        continue

                    event = self.service_events[service_name]
                    remaining_timeout = max(0, timeout - (time.time() - start_time))

                    if remaining_timeout <= 0:
                        logger.warning(f"等待 {service_name} 时总超时")
                        results[service_name] = False
                        continue

                    # 等待服务启动
                    if event.wait(timeout=remaining_timeout):
                        results[service_name] = True
                        logger.info(f"{service_name}: 启动成功")
                    else:
                        results[service_name] = False
                        logger.warning(f"{service_name}: 启动超时")

                return results

            def wait_with_polling(self, service_name: str, interval: float = 1.0, max_attempts: int = 10) -> bool:
                """使用轮询方式等待服务 - 演示非阻塞等待模式"""
                logger.info(f"使用轮询模式等待服务 {service_name}")

                for attempt in range(max_attempts):
                    # 检查事件状态(非阻塞)
                    if self.service_events[service_name].is_set():
                        logger.info(f"{service_name}: 轮询第 {attempt + 1} 次检测到事件")
                        return True

                    logger.info(f"{service_name}: 轮询第 {attempt + 1} 次,事件未设置")

                    # 检查服务状态
                    status = self.service_status.get(service_name)
                    if status == ServiceStatus.ERROR:
                        logger.error(f"{service_name}: 检测到错误状态")
                        return False

                    # 等待下一次轮询
                    time.sleep(interval)

                logger.warning(f"{service_name}: 轮询 {max_attempts} 次后仍未启动")
                return False

            def start_monitoring(self):
                """启动监控"""
                self.monitor_thread = threading.Thread(target=self._monitor_services, name="Monitor")
                self.monitor_thread.start()

                self.health_checker_thread = threading.Thread(target=self._health_checker, name="HealthChecker")
                self.health_checker_thread.start()

                logger.info("监控线程已启动")

            def _monitor_services(self):
                """监控服务状态"""
                while not self.shutdown_event.is_set():
                    time.sleep(2)

                    # 显示所有服务状态
                    logger.info("=== 服务状态监控 ===")
                    for service_name, status in self.service_status.items():
                        event_set = self.service_events[service_name].is_set()
                        logger.info(f"{service_name}: {status.value}, Event: {event_set}")

            def _health_checker(self):
                """健康检查器"""
                while not self.shutdown_event.is_set():
                    time.sleep(5)

                    # 触发健康检查事件
                    self.health_check_event.set()
                    time.sleep(0.1)
                    self.health_check_event.clear()

                    logger.debug("健康检查事件已触发")

            def shutdown_all(self, timeout: float = 10):
                """关闭所有服务"""
                logger.info("开始关闭所有服务...")

                # 发送关闭信号
                self.shutdown_event.set()

                # 等待服务停止
                start_time = time.time()
                while time.time() - start_time < timeout:
                    all_stopped = all(
                        status == ServiceStatus.STOPPED
                        for status in self.service_status.values()
                    )
                    if all_stopped:
                        break
                    time.sleep(0.5)

                # 强制等待监控线程
                if self.monitor_thread:
                    self.monitor_thread.join(timeout=2)
                if self.health_checker_thread:
                    self.health_checker_thread.join(timeout=2)

                logger.info("所有服务已关闭")

        # 使用示例
        if __name__ == "__main__":
            manager = ServiceManager()
            manager.start_monitoring()

            # 添加多个服务
            services = [
                ServiceConfig("database", startup_time=3.0, failure_rate=0.1, max_wait_time=5.0),
                ServiceConfig("cache", startup_time=2.0, failure_rate=0.05, max_wait_time=4.0),
                ServiceConfig("api_server", startup_time=4.0, failure_rate=0.15, max_wait_time=6.0),
            ]

            for config in services:
                manager.add_service(config)

            # 启动服务
            logger.info("\n=== 启动服务 ===")
            for config in services:
                success = manager.start_service(config.name)
                logger.info(f"服务 {config.name} 启动结果: {'成功' if success else '失败'}")

            # 演示多种等待模式
            logger.info("\n=== 演示多种等待模式 ===")

            # 1. 等待多个服务
            service_names = ["database", "cache"]
            results = manager.wait_for_multiple_services(service_names, timeout=8)
            logger.info(f"多服务等待结果: {results}")

            # 2. 轮询等待
            poll_result = manager.wait_with_polling("api_server", interval=0.5, max_attempts=8)
            logger.info(f"轮询等待结果: {poll_result}")

            # 让服务运行一段时间
            time.sleep(5)

            # 关闭所有服务
            logger.info("\n=== 关闭服务 ===")
            manager.shutdown_all()
        ---

05.方法组合使用模式
    a.set() + wait() 组合
        a.一次性通知
            生产者设置事件,消费者等待事件发生。
        b.多消费者模式
            多个线程等待同一个事件,同时被唤醒。
        c.代码示例
            ---
            # set() + wait() 组合示例 - 多消费者模式
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class MultiConsumerSystem:
                """多消费者系统演示"""
                def __init__(self, num_consumers=5):
                    self.data_ready = threading.Event()
                    self.consumers = []
                    self.consumer_results = []

                def producer(self):
                    """生产者线程"""
                    logger.info("生产者: 开始准备数据")

                    # 模拟数据准备
                    for i in range(3):
                        time.sleep(1)
                        logger.info(f"生产者: 数据准备进度 {i+1}/3")

                    # 数据准备完成,设置事件
                    logger.info("生产者: 数据准备完成,通知所有消费者")
                    self.data_ready.set()

                    # 等待所有消费者完成
                    for consumer in self.consumers:
                        consumer.join()

                    logger.info(f"生产者: 所有消费者完成,共收到 {len(self.consumer_results)} 个结果")

                def consumer(self, consumer_id):
                    """消费者线程"""
                    logger.info(f"消费者{consumer_id}: 等待数据")

                    # 等待数据准备事件
                    self.data_ready.wait()
                    logger.info(f"消费者{consumer_id}: 收到数据通知,开始处理")

                    # 处理数据
                    processing_time = 2
                    for i in range(processing_time):
                        time.sleep(0.5)
                        logger.info(f"消费者{consumer_id}: 处理进度 {i+1}/{processing_time}")

                    # 生成结果
                    result = f"消费者{consumer_id}-处理结果"
                    self.consumer_results.append(result)
                    logger.info(f"消费者{consumer_id}: 处理完成,结果: {result}")

                def run_demo(self):
                    """运行演示"""
                    # 启动消费者线程
                    for i in range(5):
                        consumer = threading.Thread(
                            target=self.consumer,
                            args=(i+1,)
                        )
                        self.consumers.append(consumer)
                        consumer.start()

                    # 启动生产者线程
                    producer = threading.Thread(target=self.producer)
                    producer.start()

                    # 等待所有线程完成
                    producer.join()

                    logger.info("演示完成")

            if __name__ == "__main__":
                demo = MultiConsumerSystem(num_consumers=5)
                demo.run_demo()
            ---

        b.clear() + wait() 组合
            a.重复同步
            清除事件状态,等待下一次设置。
            b.循环控制
            在循环中使用事件控制执行流程。
            c.代码示例
            ---
            # clear() + wait() 组合示例 - 循环同步
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class RoundSyncSystem:
                """循环同步系统演示"""
                def __init__(self, num_workers=3):
                    self.round_event = threading.Event()
                    self.workers = []

                def coordinator(self):
                    """协调器线程"""
                    for round_num in range(1, 4):
                        logger.info(f"=== 第 {round_num} 轮开始 ===")

                        # 清除事件
                        self.round_event.clear()
                        logger.info("协调器: 事件已清除")

                        # 等待工作线程准备
                        time.sleep(0.5)

                        # 设置事件开始工作
                        logger.info("协调器: 发送开始信号")
                        self.round_event.set()

                        # 等待本轮完成
                        time.sleep(3)
                        logger.info(f"协调器: 第 {round_num} 轮完成\n")

                def worker(self, worker_id):
                    """工作线程"""
                    while True:
                        # 等待事件
                        self.round_event.wait()
                        logger.info(f"工作线程{worker_id}: 收到信号,开始工作")

                        # 模拟工作
                        time.sleep(2)
                        logger.info(f"工作线程{worker_id}: 工作完成")

                        # 等待下一轮
                        if not self.round_event.is_set():
                            logger.info(f"工作线程{worker_id}: 等待下一轮...")

                def run_demo(self):
                    """运行演示"""
                    # 启动工作线程
                    for i in range(3):
                        worker = threading.Thread(
                            target=self.worker,
                            args=(i+1,),
                            daemon=True
                        )
                        self.workers.append(worker)
                        worker.start()

                    # 启动协调器
                    coordinator = threading.Thread(target=self.coordinator)
                    coordinator.start()

                    # 等待协调器完成
                    coordinator.join()

                    logger.info("演示完成")

            if __name__ == "__main__":
                demo = RoundSyncSystem(num_workers=3)
                demo.run_demo()
            ---

        c.is_set() + wait() 组合
            a.状态检查
            非阻塞检查事件状态,避免不必要的等待。
            b.条件等待
            根据状态决定是否等待。
            c.代码示例
            ---
            # is_set() + wait() 组合示例 - 条件等待
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class ConditionalWaitSystem:
                """条件等待系统演示"""
                def __init__(self):
                    self.data_event = threading.Event()
                    self.stop_event = threading.Event()

                def data_producer(self):
                    """数据生产者"""
                    for batch in range(1, 4):
                        logger.info(f"生产者: 生产批次 {batch}")

                        # 模拟数据生产
                        time.sleep(2)

                        # 设置数据事件
                        logger.info(f"生产者: 批次 {batch} 完成,设置事件")
                        self.data_event.set()

                        # 短暂清除事件
                        time.sleep(1)
                        self.data_event.clear()

                    logger.info("生产者: 所有批次完成")

                def conditional_consumer(self, consumer_id):
                    """条件消费者"""
                    while not self.stop_event.is_set():
                        # 先检查事件状态
                        if self.data_event.is_set():
                            logger.info(f"消费者{consumer_id}: 检测到数据可用,开始处理")

                            # 直接处理数据,不需要wait()
                            time.sleep(1)
                            logger.info(f"消费者{consumer_id}: 数据处理完成")
                        else:
                            logger.info(f"消费者{consumer_id}: 无数据可用,等待...")

                            # 等待数据可用
                            if self.data_event.wait(timeout=3):
                                logger.info(f"消费者{consumer_id}: 等待到数据,开始处理")
                                time.sleep(1)
                                logger.info(f"消费者{consumer_id}: 处理完成")
                            else:
                                logger.info(f"消费者{consumer_id}: 等待超时,继续检查")

                        # 短暂休息
                        time.sleep(0.5)

                    logger.info(f"消费者{consumer_id}: 收到停止信号,退出")

                def run_demo(self):
                    """运行演示"""
                    # 启动生产者
                    producer = threading.Thread(target=self.data_producer)
                    producer.start()

                    # 启动消费者
                    consumers = []
                    for i in range(2):
                        consumer = threading.Thread(
                            target=self.conditional_consumer,
                            args=(i+1,)
                        )
                        consumers.append(consumer)
                        consumer.start()

                    # 等待生产者完成
                    producer.join()

                    # 等待一段时间让消费者处理完
                    time.sleep(2)

                    # 停止消费者
                    self.stop_event.set()
                    for consumer in consumers:
                        consumer.join()

                    logger.info("演示完成")

            if __name__ == "__main__":
                demo = ConditionalWaitSystem()
                demo.run_demo()
            ---

5.3 线程间通信

01.线程间通信基础概念
    a.定义与重要性
        线程间通信是指在不同线程之间传递信息、同步状态、协调行为的过程。
        有效的线程通信是实现并发程序正确性和效率的关键因素。
    b.通信类型
        a.信号通知
            通过事件对象传递简单的状态变化信号。
        b.数据传递
            在线程间传递复杂的数据结构。
        c.控制流协调
            协调线程的执行顺序和时机。
        d.资源共享
            安全地共享访问系统资源。
    c.Event在线程通信中的作用
        a.轻量级通知机制
            提供简单的开/关信号通知。
        b.状态同步工具
            同步线程间的状态变化。
        c.条件触发器
            基于特定条件触发线程行为。
        d.协调控制点
            作为多线程协调的控制点。

02.Event基础的线程通信模式
    a.一对一通知模式
        a.模式描述
            一个生产者线程通知一个消费者线程。
        b.应用场景
            简单的任务完成通知、状态变化通知。
        c.实现要点
            使用单个Event对象,生产者调用set(),消费者调用wait()。
        d.代码示例
            ---
            # Event一对一通知模式 - 简单任务完成通知
            import threading
            import time
            import logging
            from typing import Optional

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class TaskNotificationSystem:
                """任务通知系统演示一对一通信"""
                def __init__(self):
                    self.task_complete = threading.Event()
                    self.task_result = None

                def worker_thread(self, task_data: str) -> str:
                    """工作线程执行任务"""
                    logger.info(f"工作线程: 开始处理任务 - {task_data}")

                    # 模拟任务处理
                    processing_steps = 3
                    for i in range(processing_steps):
                        time.sleep(0.8)
                        logger.info(f"工作线程: 处理进度 {i+1}/{processing_steps}")

                    # 任务完成,生成结果
                    result = f"处理结果-{task_data}-{int(time.time())}"
                    self.task_result = result
                    logger.info(f"工作线程: 任务完成,结果 - {result}")

                    # 通知主线程任务完成
                    self.task_complete.set()
                    logger.info("工作线程: 已通知主线程")

                    return result

                def coordinator_thread(self, task_data: str) -> None:
                    """协调器线程等待任务完成"""
                    logger.info("协调器: 启动工作线程并等待完成")

                    # 启动工作线程
                    worker = threading.Thread(
                        target=self.worker_thread,
                        args=(task_data,)
                    )
                    worker.start()

                    # 等待任务完成通知
                    logger.info("协调器: 等待任务完成通知...")
                    self.task_complete.wait()

                    logger.info("协调器: 收到任务完成通知")
                    logger.info(f"协调器: 任务结果 - {self.task_result}")

                    # 等待工作线程结束
                    worker.join()
                    logger.info("协调器: 工作线程已结束")

                def run_demo(self, task_data: str = "示例数据"):
                    """运行演示"""
                    logger.info("=== 一对一通知模式演示 ===")

                    # 重置状态
                    self.task_complete.clear()
                    self.task_result = None

                    # 启动协调器
                    coordinator = threading.Thread(
                        target=self.coordinator_thread,
                        args=(task_data,)
                    )
                    coordinator.start()

                    # 等待协调器完成
                    coordinator.join()

                    logger.info("演示完成")

            # 使用示例
            if __name__ == "__main__":
                notification_system = TaskNotificationSystem()
                notification_system.run_demo("重要数据文件")
            ---

    b.一对多通知模式
        a.模式描述
            一个生产者线程同时通知多个消费者线程。
        b.应用场景
            广播通知、并行任务启动、系统状态变更通知。
        c.实现要点
            多个消费者线程等待同一个Event对象,生产者调用set()唤醒所有线程。
        d.代码示例
            ---
            # Event一对多通知模式 - 广播通知系统
            import threading
            import time
            import logging
            from typing import List, Dict
            from enum import Enum

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class NotificationType(Enum):
                """通知类型枚举"""
                SYSTEM_START = "系统启动"
                SYSTEM_SHUTDOWN = "系统关闭"
                DATA_READY = "数据就绪"
                EMERGENCY = "紧急情况"

            class BroadcastNotifier:
                """广播通知器演示一对多通信"""
                def __init__(self):
                    # 不同类型的事件对象
                    self.events = {
                        NotificationType.SYSTEM_START: threading.Event(),
                        NotificationType.SYSTEM_SHUTDOWN: threading.Event(),
                        NotificationType.DATA_READY: threading.Event(),
                        NotificationType.EMERGENCY: threading.Event()
                    }

                    # 订阅者列表
                    self.subscribers: Dict[NotificationType, List[str]] = {}
                    for notification_type in NotificationType:
                        self.subscribers[notification_type] = []

                    # 消息存储
                    self.current_message = None

                def subscribe(self, subscriber_name: str, notification_type: NotificationType):
                    """订阅特定类型的通知"""
                    self.subscribers[notification_type].append(subscriber_name)
                    logger.info(f"{subscriber_name} 订阅了 {notification_type.value} 通知")

                def broadcast(self, notification_type: NotificationType, message: str = ""):
                    """广播通知给所有订阅者"""
                    self.current_message = message
                    logger.info(f"广播 {notification_type.value} 通知: {message}")

                    # 设置事件,唤醒所有等待的订阅者
                    self.events[notification_type].set()
                    logger.info(f"已通知 {len(self.subscribers[notification_type])} 个订阅者")

                def subscriber_worker(self, subscriber_name: str, notification_type: NotificationType):
                    """订阅者工作线程"""
                    logger.info(f"订阅者 {subscriber_name} 启动,等待 {notification_type.value} 通知")

                    # 等待特定类型的通知
                    self.events[notification_type].wait()
                    logger.info(f"订阅者 {subscriber_name} 收到 {notification_type.value} 通知")

                    # 处理通知
                    if notification_type == NotificationType.SYSTEM_START:
                        self._handle_system_start(subscriber_name)
                    elif notification_type == NotificationType.DATA_READY:
                        self._handle_data_ready(subscriber_name)
                    elif notification_type == NotificationType.EMERGENCY:
                        self._handle_emergency(subscriber_name)
                    elif notification_type == NotificationType.SYSTEM_SHUTDOWN:
                        self._handle_system_shutdown(subscriber_name)

                def _handle_system_start(self, subscriber_name: str):
                    """处理系统启动通知"""
                    logger.info(f"订阅者 {subscriber_name}: 开始系统初始化")
                    time.sleep(1)
                    logger.info(f"订阅者 {subscriber_name}: 系统初始化完成")

                def _handle_data_ready(self, subscriber_name: str):
                    """处理数据就绪通知"""
                    logger.info(f"订阅者 {subscriber_name}: 开始处理数据")
                    time.sleep(1.5)
                    logger.info(f"订阅者 {subscriber_name}: 数据处理完成")

                def _handle_emergency(self, subscriber_name: str):
                    """处理紧急情况通知"""
                    logger.warning(f"订阅者 {subscriber_name}: 收到紧急通知,执行应急程序")
                    time.sleep(0.5)
                    logger.warning(f"订阅者 {subscriber_name}: 应急程序执行完成")

                def _handle_system_shutdown(self, subscriber_name: str):
                    """处理系统关闭通知"""
                    logger.info(f"订阅者 {subscriber_name}: 开始清理工作")
                    time.sleep(0.8)
                    logger.info(f"订阅者 {subscriber_name}: 清理工作完成,准备退出")

                def demo_broadcast_notification(self):
                    """演示广播通知"""
                    logger.info("=== 一对多广播通知演示 ===")

                    # 创建订阅者线程
                    subscriber_threads = []

                    # 系统启动通知订阅者
                    for i in range(2):
                        subscriber_name = f"启动服务{i+1}"
                        self.subscribe(subscriber_name, NotificationType.SYSTEM_START)

                        thread = threading.Thread(
                            target=self.subscriber_worker,
                            args=(subscriber_name, NotificationType.SYSTEM_START)
                        )
                        subscriber_threads.append(thread)
                        thread.start()

                    # 数据就绪通知订阅者
                    for i in range(3):
                        subscriber_name = f"数据处理{i+1}"
                        self.subscribe(subscriber_name, NotificationType.DATA_READY)

                        thread = threading.Thread(
                            target=self.subscriber_worker,
                            args=(subscriber_name, NotificationType.DATA_READY)
                        )
                        subscriber_threads.append(thread)
                        thread.start()

                    # 紧急通知订阅者
                    for i in range(2):
                        subscriber_name = f"监控服务{i+1}"
                        self.subscribe(subscriber_name, NotificationType.EMERGENCY)

                        thread = threading.Thread(
                            target=self.subscriber_worker,
                            args=(subscriber_name, NotificationType.EMERGENCY)
                        )
                        subscriber_threads.append(thread)
                        thread.start()

                    # 等待所有订阅者准备好
                    time.sleep(0.5)

                    # 依次发送不同类型的通知
                    logger.info("\n--- 发送系统启动通知 ---")
                    self.broadcast(NotificationType.SYSTEM_START, "系统即将启动")
                    time.sleep(3)  # 等待启动服务完成

                    # 清除启动事件,准备下一轮
                    self.events[NotificationType.SYSTEM_START].clear()

                    logger.info("\n--- 发送数据就绪通知 ---")
                    self.broadcast(NotificationType.DATA_READY, "新数据已准备就绪")
                    time.sleep(4)  # 等待数据处理完成

                    logger.info("\n--- 发送紧急通知 ---")
                    self.broadcast(NotificationType.EMERGENCY, "检测到系统异常")
                    time.sleep(2)  # 等待应急处理完成

                    # 等待所有订阅者线程完成
                    for thread in subscriber_threads:
                        thread.join(timeout=3)
                        if thread.is_alive():
                            logger.warning(f"订阅者线程 {thread.name} 未能在超时内完成")

                    logger.info("所有订阅者处理完成")

                def demo_emergency_broadcast(self):
                    """演示紧急广播 - 所有订阅者都接收"""
                    logger.info("=== 紧急广播演示 ===")

                    # 创建通用订阅者,接收所有类型通知
                    universal_threads = []
                    for i in range(3):
                        subscriber_name = f"通用服务{i+1}"

                        # 订阅所有通知类型
                        for notification_type in NotificationType:
                            self.subscribe(subscriber_name, notification_type)

                    # 这里简化演示,只订阅紧急通知
                    for i in range(3):
                        subscriber_name = f"通用服务{i+1}"
                        thread = threading.Thread(
                            target=self.subscriber_worker,
                            args=(subscriber_name, NotificationType.EMERGENCY)
                        )
                        universal_threads.append(thread)
                        thread.start()

                    time.sleep(0.5)

                    # 发送紧急广播
                    self.broadcast(NotificationType.EMERGENCY, "系统严重故障,立即响应!")

                    # 等待处理完成
                    for thread in universal_threads:
                        thread.join()

                    logger.info("紧急广播处理完成")

            # 使用示例
            if __name__ == "__main__":
                notifier = BroadcastNotifier()

                # 演示1: 分类广播通知
                notifier.demo_broadcast_notification()

                print("\n" + "="*60)
                print("演示2: 紧急广播")
                print("="*60)

                # 演示2: 紧急广播
                notifier.demo_emergency_broadcast()
            ---

    c.多对一通知模式
        a.模式描述
                多个生产者线程通知同一个消费者线程。
        b.应用场景
            多个任务完成后通知主线程、多个服务状态汇总。
        c.实现要点
            使用多个Event对象或使用同一个Event对象配合计数器。
        d.代码示例
            ---
            # Event多对一通知模式 - 任务完成汇总
            import threading
            import time
            import logging
            from typing import List, Dict
            from dataclasses import dataclass
            from enum import Enum

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class TaskStatus(Enum):
                """任务状态枚举"""
                PENDING = "等待中"
                RUNNING = "运行中"
                COMPLETED = "已完成"
                FAILED = "失败"

            @dataclass
            class TaskResult:
                """任务结果"""
                task_id: int
                task_name: str
                status: TaskStatus
                result_data: str
                duration: float
                timestamp: float

            class TaskCollector:
                """任务收集器演示多对一通信"""
                def __init__(self, num_tasks: int = 5):
                    self.num_tasks = num_tasks
                    self.task_events: List[threading.Event] = []
                    self.task_results: Dict[int, TaskResult] = {}
                    self.collection_event = threading.Event()
                    self.worker_threads: List[threading.Thread] = []

                    # 为每个任务创建事件对象
                    for i in range(num_tasks):
                        self.task_events.append(threading.Event())

                def task_worker(self, task_id: int, task_name: str, difficulty: str = "normal"):
                    """任务工作线程"""
                    start_time = time.time()
                    logger.info(f"任务{task_id} ({task_name}): 开始执行")

                    try:
                        # 根据难度设置执行时间
                        if difficulty == "easy":
                            execution_time = 1.0
                        elif difficulty == "hard":
                            execution_time = 4.0
                        else:
                            execution_time = 2.5

                        # 模拟任务执行
                        steps = int(execution_time * 2)
                        for step in range(steps):
                            time.sleep(0.5)
                            progress = (step + 1) / steps * 100
                            logger.info(f"任务{task_id}: 进度 {progress:.1f}%")

                            # 模拟10%的失败率
                            if difficulty == "hard" and step == steps // 2 and time.time() % 10 < 1:
                                raise Exception(f"任务{task_id}: 执行过程中出现错误")

                        # 任务完成
                        duration = time.time() - start_time
                        result = TaskResult(
                            task_id=task_id,
                            task_name=task_name,
                            status=TaskStatus.COMPLETED,
                            result_data=f"任务{task_id}-结果-{int(time.time())}",
                            duration=duration,
                            timestamp=time.time()
                        )

                        logger.info(f"任务{task_id}: 执行成功,耗时 {duration:.2f}s")

                    except Exception as e:
                        # 任务失败
                        duration = time.time() - start_time
                        result = TaskResult(
                            task_id=task_id,
                            task_name=task_name,
                            status=TaskStatus.FAILED,
                            result_data=f"错误信息: {str(e)}",
                            duration=duration,
                            timestamp=time.time()
                        )
                        logger.error(f"任务{task_id}: 执行失败 - {e}")

                    # 存储结果
                    self.task_results[task_id] = result

                    # 通知收集器任务完成
                    self.task_events[task_id - 1].set()
                    logger.info(f"任务{task_id}: 已通知收集器")

                def result_collector(self):
                    """结果收集器线程"""
                    logger.info("结果收集器: 启动,等待任务完成通知")

                    completed_tasks = 0
                    start_time = time.time()

                    while completed_tasks < self.num_tasks:
                        # 等待任意一个任务完成
                        for i, event in enumerate(self.task_events):
                            task_id = i + 1
                            if event.is_set() and task_id not in self.task_results:
                                # 这个任务的事件被设置了,但结果还没处理
                                result = self.task_results.get(task_id)
                                if result:
                                    completed_tasks += 1
                                    logger.info(f"收集器: 收到任务{task_id}完成通知")
                                    logger.info(f"收集器: 任务{task_id}状态: {result.status.value}")
                                    logger.info(f"收集器: 任务{task_id}结果: {result.result_data}")

                        # 检查是否所有任务都已完成
                        if completed_tasks >= self.num_tasks:
                            break

                        # 短暂等待,避免CPU占用过高
                        time.sleep(0.1)

                    # 所有任务完成,设置收集完成事件
                    total_duration = time.time() - start_time
                    logger.info(f"收集器: 所有任务收集完成,总耗时 {total_duration:.2f}s")
                    self.collection_event.set()

                def run_demo(self):
                    """运行演示"""
                    logger.info("=== 多对一通知模式演示 ===")

                    # 定义任务列表
                    tasks = [
                        (1, "数据预处理", "easy"),
                        (2, "模型训练", "hard"),
                        (3, "结果验证", "normal"),
                        (4, "报告生成", "easy"),
                        (5, "数据备份", "normal")
                    ]

                    # 启动结果收集器
                    collector_thread = threading.Thread(target=self.result_collector)
                    collector_thread.start()

                    # 启动所有任务工作线程
                    for task_id, task_name, difficulty in tasks:
                        worker = threading.Thread(
                            target=self.task_worker,
                            args=(task_id, task_name, difficulty)
                        )
                        self.worker_threads.append(worker)
                        worker.start()

                    # 等待收集器完成
                    self.collection_event.wait()
                    collector_thread.join()

                    # 等待所有工作线程完成
                    for worker in self.worker_threads:
                        worker.join()

                    # 打印汇总结果
                    self._print_summary()

                def _print_summary(self):
                    """打印任务执行汇总"""
                    logger.info("\n=== 任务执行汇总 ===")

                    completed_count = 0
                    failed_count = 0
                    total_duration = 0

                    for task_id in range(1, self.num_tasks + 1):
                        result = self.task_results.get(task_id)
                        if result:
                            if result.status == TaskStatus.COMPLETED:
                                completed_count += 1
                            elif result.status == TaskStatus.FAILED:
                                failed_count += 1

                            total_duration += result.duration

                            logger.info(f"任务{task_id} ({result.task_name}): "
                                      f"{result.status.value}, "
                                      f"耗时 {result.duration:.2f}s, "
                                      f"结果: {result.result_data}")

                    logger.info(f"\n汇总统计:")
                    logger.info(f"总任务数: {self.num_tasks}")
                    logger.info(f"成功完成: {completed_count}")
                    logger.info(f"失败: {failed_count}")
                    logger.info(f"总耗时: {total_duration:.2f}s")
                    logger.info(f"平均耗时: {total_duration/self.num_tasks:.2f}s")

            # 使用示例
            if __name__ == "__main__":
                collector = TaskCollector(num_tasks=5)
                collector.run_demo()
            ---

03.Event高级通信模式
    a.条件触发通信
        a.模式描述
            基于特定条件触发线程间通信。
        b.应用场景
            复杂的业务逻辑判断、多条件组合触发。
        c.实现要点
            将条件判断与Event触发结合使用。
        d.代码示例
            ---
            # Event条件触发通信 - 智能监控系统
            import threading
            import time
            import logging
            import random
            from typing import Dict, List, Tuple
            from dataclasses import dataclass
            from enum import Enum

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class AlertLevel(Enum):
                """告警级别"""
                INFO = "信息"
                WARNING = "警告"
                ERROR = "错误"
                CRITICAL = "严重"

            @dataclass
            class MetricData:
                """指标数据"""
                cpu_usage: float  # CPU使用率 (0-100)
                memory_usage: float  # 内存使用率 (0-100)
                disk_usage: float  # 磁盘使用率 (0-100)
                network_io: float  # 网络IO (MB/s)
                response_time: float  # 响应时间 (ms)
                error_rate: float  # 错误率 (0-1)
                timestamp: float

            @dataclass
            class AlertCondition:
                """告警条件"""
                name: str
                level: AlertLevel
                check_function: callable
                description: str

            class IntelligentMonitor:
                """智能监控系统演示条件触发通信"""
                def __init__(self):
                    # 监控指标
                    self.current_metrics: MetricData = None
                    self.metrics_history: List[MetricData] = []
                    self.max_history_size = 10

                    # 告警事件
                    self.alert_events: Dict[AlertLevel, threading.Event] = {
                        AlertLevel.INFO: threading.Event(),
                        AlertLevel.WARNING: threading.Event(),
                        AlertLevel.ERROR: threading.Event(),
                        AlertLevel.CRITICAL: threading.Event()
                    }

                    # 系统控制事件
                    self.system_stop_event = threading.Event()
                    self.maintenance_mode_event = threading.Event()

                    # 告警处理器
                    self.alert_handlers: Dict[AlertLevel, List[threading.Thread]] = {}
                    self.active_handlers: List[threading.Thread] = []

                    # 告警条件定义
                    self.alert_conditions = self._define_alert_conditions()

                def _define_alert_conditions(self) -> List[AlertCondition]:
                    """定义告警条件"""
                    conditions = [
                        AlertCondition(
                            name="高CPU使用率",
                            level=AlertLevel.WARNING,
                            check_function=lambda m: m.cpu_usage > 80,
                            description="CPU使用率超过80%"
                        ),
                        AlertCondition(
                            name="极高CPU使用率",
                            level=AlertLevel.ERROR,
                            check_function=lambda m: m.cpu_usage > 95,
                            description="CPU使用率超过95%"
                        ),
                        AlertCondition(
                            name="内存不足",
                            level=AlertLevel.WARNING,
                            check_function=lambda m: m.memory_usage > 85,
                            description="内存使用率超过85%"
                        ),
                        AlertCondition(
                            name="内存严重不足",
                            level=AlertLevel.ERROR,
                            check_function=lambda m: m.memory_usage > 95,
                            description="内存使用率超过95%"
                        ),
                        AlertCondition(
                            name="响应时间过长",
                            level=AlertLevel.WARNING,
                            check_function=lambda m: m.response_time > 5000,
                            description="响应时间超过5秒"
                        ),
                        AlertCondition(
                            name="响应时间严重过长",
                            level=AlertLevel.ERROR,
                            check_function=lambda m: m.response_time > 10000,
                            description="响应时间超过10秒"
                        ),
                        AlertCondition(
                            name="错误率过高",
                            level=AlertLevel.WARNING,
                            check_function=lambda m: m.error_rate > 0.05,
                            description="错误率超过5%"
                        ),
                        AlertCondition(
                            name="错误率严重过高",
                            level=AlertLevel.CRITICAL,
                            check_function=lambda m: m.error_rate > 0.1,
                            description="错误率超过10%"
                        ),
                        AlertCondition(
                            name="系统过载",
                            level=AlertLevel.CRITICAL,
                            check_function=self._check_system_overload,
                            description="系统整体过载"
                        )
                    ]
                    return conditions

                def _check_system_overload(self, metrics: MetricData) -> bool:
                    """检查系统是否过载"""
                    # 综合多个指标判断系统过载
                    cpu_overload = metrics.cpu_usage > 90
                    memory_overload = metrics.memory_usage > 90
                    response_slow = metrics.response_time > 8000
                    error_high = metrics.error_rate > 0.08

                    # 至少3个条件满足才认为系统过载
                    overload_count = sum([cpu_overload, memory_overload, response_slow, error_high])
                    return overload_count >= 3

                def metrics_collector(self):
                    """指标收集线程"""
                    logger.info("指标收集器: 启动")

                    while not self.system_stop_event.is_set():
                        # 检查维护模式
                        if self.maintenance_mode_event.is_set():
                            logger.info("指标收集器: 进入维护模式,暂停收集")
                            time.sleep(5)
                            continue

                        # 生成模拟指标数据
                        metrics = self._generate_metrics()

                        # 更新当前指标
                        self.current_metrics = metrics

                        # 添加到历史记录
                        self.metrics_history.append(metrics)
                        if len(self.metrics_history) > self.max_history_size:
                            self.metrics_history.pop(0)

                        logger.debug(f"指标收集: CPU={metrics.cpu_usage:.1f}%, "
                                   f"内存={metrics.memory_usage:.1f}%, "
                                   f"响应时间={metrics.response_time:.0f}ms")

                        # 检查告警条件
                        triggered_alerts = self._check_alert_conditions(metrics)

                        # 触发相应级别的告警事件
                        for alert_level, condition in triggered_alerts:
                            self._trigger_alert(alert_level, condition, metrics)

                        # 收集间隔
                        time.sleep(2)

                    logger.info("指标收集器: 停止")

                def _generate_metrics(self) -> MetricData:
                    """生成模拟指标数据"""
                    # 基础值 + 随机波动
                    base_cpu = 50
                    base_memory = 60
                    base_response = 2000
                    base_error_rate = 0.02

                    # 添加随机波动,偶尔产生异常值
                    if random.random() < 0.1:  # 10%概率产生异常
                        spike_factor = random.uniform(1.5, 3.0)
                        cpu_spike = min(100, base_cpu * spike_factor + random.uniform(-10, 10))
                        memory_spike = min(100, base_memory * spike_factor + random.uniform(-5, 5))
                    else:
                        cpu_spike = base_cpu + random.uniform(-20, 30)
                        memory_spike = base_memory + random.uniform(-15, 20)

                    metrics = MetricData(
                        cpu_usage=max(0, min(100, cpu_spike)),
                        memory_usage=max(0, min(100, memory_spike)),
                        disk_usage=70 + random.uniform(-10, 15),
                        network_io=random.uniform(10, 100),
                        response_time=max(100, base_response + random.uniform(-500, 3000)),
                        error_rate=max(0, min(1, base_error_rate + random.uniform(-0.02, 0.08))),
                        timestamp=time.time()
                    )

                    return metrics

                def _check_alert_conditions(self, metrics: MetricData) -> List[Tuple[AlertLevel, AlertCondition]]:
                    """检查告警条件"""
                    triggered_alerts = []

                    for condition in self.alert_conditions:
                        if condition.check_function(metrics):
                            triggered_alerts.append((condition.level, condition))

                    return triggered_alerts

                def _trigger_alert(self, level: AlertLevel, condition: AlertCondition, metrics: MetricData):
                    """触发告警"""
                    logger.warning(f"触发告警: {level.value} - {condition.name}")

                    # 设置相应级别的事件
                    self.alert_events[level].set()

                    # 如果是严重告警,也触发更高级别的事件
                    if level == AlertLevel.CRITICAL:
                        self.alert_events[AlertLevel.ERROR].set()
                    elif level == AlertLevel.ERROR:
                        self.alert_events[AlertLevel.WARNING].set()

                def alert_handler(self, handler_id: int, alert_level: AlertLevel):
                    """告警处理器"""
                    handler_name = f"{alert_level.value}处理器-{handler_id}"
                    logger.info(f"{handler_name}: 启动")

                    while not self.system_stop_event.is_set():
                        # 等待告警事件
                        if self.alert_events[alert_level].wait(timeout=1):
                            if self.system_stop_event.is_set():
                                break

                            logger.info(f"{handler_name}: 收到{alert_level.value}告警,开始处理")

                            # 获取当前指标
                            metrics = self.current_metrics
                            if metrics:
                                self._handle_alert(handler_name, alert_level, metrics)

                            # 根据告警级别决定处理策略
                            if alert_level == AlertLevel.CRITICAL:
                                self._handle_critical_alert(handler_name)
                            elif alert_level == AlertLevel.ERROR:
                                self._handle_error_alert(handler_name)

                            # 重置事件
                            self.alert_events[alert_level].clear()

                    logger.info(f"{handler_name}: 停止")

                def _handle_alert(self, handler_name: str, level: AlertLevel, metrics: MetricData):
                    """处理具体告警"""
                    if level == AlertLevel.WARNING:
                        action = self._determine_warning_action(metrics)
                        logger.info(f"{handler_name}: 警告处理 - {action}")
                        time.sleep(1)  # 模拟处理时间
                    elif level == AlertLevel.ERROR:
                        action = self._determine_error_action(metrics)
                        logger.warning(f"{handler_name}: 错误处理 - {action}")
                        time.sleep(2)  # 模拟处理时间
                    elif level == AlertLevel.CRITICAL:
                        action = self._determine_critical_action(metrics)
                        logger.error(f"{handler_name}: 严重告警处理 - {action}")
                        time.sleep(3)  # 模拟处理时间

                def _determine_warning_action(self, metrics: MetricData) -> str:
                    """确定警告级别的处理动作"""
                    if metrics.cpu_usage > 80:
                        return "增加CPU资源或优化进程"
                    elif metrics.memory_usage > 85:
                        return "清理内存缓存或增加内存"
                    elif metrics.response_time > 5000:
                        return "优化数据库查询或增加缓存"
                    elif metrics.error_rate > 0.05:
                        return "检查应用日志,修复错误"
                    else:
                        return "监控系统状态"

                def _determine_error_action(self, metrics: MetricData) -> str:
                    """确定错误级别的处理动作"""
                    if metrics.cpu_usage > 95:
                        return "紧急扩容或负载均衡"
                    elif metrics.memory_usage > 95:
                        return "强制清理内存或重启服务"
                    elif metrics.response_time > 10000:
                        return "启动降级服务或熔断机制"
                    else:
                        return "启动应急预案"

                def _determine_critical_action(self, metrics: MetricData) -> str:
                    """确定严重告警的处理动作"""
                    return "启动系统级应急响应,可能需要停机维护"

                def _handle_critical_alert(self, handler_name: str):
                    """处理严重告警的特殊逻辑"""
                    logger.error(f"{handler_name}: 启动系统维护模式")
                    self.maintenance_mode_event.set()

                    # 模拟维护过程
                    time.sleep(5)

                    logger.error(f"{handler_name}: 维护完成,恢复正常模式")
                    self.maintenance_mode_event.clear()

                def _handle_error_alert(self, handler_name: str):
                    """处理错误告警的特殊逻辑"""
                    logger.warning(f"{handler_name}: 记录错误告警到数据库")
                    # 这里可以添加更多错误处理逻辑

                def start_monitoring(self):
                    """启动监控系统"""
                    logger.info("=== 启动智能监控系统 ===")

                    # 启动指标收集器
                    collector_thread = threading.Thread(target=self.metrics_collector, name="指标收集器")
                    collector_thread.start()
                    self.active_handlers.append(collector_thread)

                    # 启动各级别告警处理器
                    alert_levels = [AlertLevel.WARNING, AlertLevel.ERROR, AlertLevel.CRITICAL]
                    for level in alert_levels:
                        for i in range(2):  # 每个级别启动2个处理器
                            handler_thread = threading.Thread(
                                target=self.alert_handler,
                                args=(i+1, level),
                                name=f"{level.value}处理器{i+1}"
                            )
                            handler_thread.start()
                            self.active_handlers.append(handler_thread)

                    return collector_thread

                def stop_monitoring(self):
                    """停止监控系统"""
                    logger.info("停止监控系统...")

                    # 设置停止事件
                    self.system_stop_event.set()

                    # 触发所有告警事件,唤醒处理器
                    for event in self.alert_events.values():
                        event.set()

                    # 等待所有线程完成
                    for thread in self.active_handlers:
                        thread.join(timeout=3)
                        if thread.is_alive():
                            logger.warning(f"线程 {thread.name} 未能在超时内停止")

                    logger.info("监控系统已停止")

                def run_demo(self, duration: int = 30):
                    """运行演示"""
                    logger.info(f"=== 智能监控系统演示,运行时间: {duration}秒 ===")

                    # 启动监控
                    self.start_monitoring()

                    # 运行指定时间
                    time.sleep(duration)

                    # 停止监控
                    self.stop_monitoring()

                    # 打印监控统计
                    self._print_monitoring_stats()

                def _print_monitoring_stats(self):
                    """打印监控统计信息"""
                    logger.info("\n=== 监控统计信息 ===")

                    if self.metrics_history:
                        avg_cpu = sum(m.cpu_usage for m in self.metrics_history) / len(self.metrics_history)
                        avg_memory = sum(m.memory_usage for m in self.metrics_history) / len(self.metrics_history)
                        avg_response = sum(m.response_time for m in self.metrics_history) / len(self.metrics_history)
                        avg_error_rate = sum(m.error_rate for m in self.metrics_history) / len(self.metrics_history)

                        max_cpu = max(m.cpu_usage for m in self.metrics_history)
                        max_memory = max(m.memory_usage for m in self.metrics_history)
                        max_response = max(m.response_time for m in self.metrics_history)
                        max_error_rate = max(m.error_rate for m in self.metrics_history)

                        logger.info(f"监控样本数: {len(self.metrics_history)}")
                        logger.info(f"平均CPU使用率: {avg_cpu:.1f}% (最高: {max_cpu:.1f}%)")
                        logger.info(f"平均内存使用率: {avg_memory:.1f}% (最高: {max_memory:.1f}%)")
                        logger.info(f"平均响应时间: {avg_response:.0f}ms (最高: {max_response:.0f}ms)")
                        logger.info(f"平均错误率: {avg_error_rate*100:.2f}% (最高: {max_error_rate*100:.2f}%)")

            # 使用示例
            if __name__ == "__main__":
                monitor = IntelligentMonitor()
                monitor.run_demo(duration=25)  # 运行25秒演示
            ---

    b.状态机通信
        a.模式描述
            使用Event对象实现有限状态机(FSM)的状态转换。
        b.应用场景
            工作流管理、协议状态管理、复杂业务流程控制。
        c.实现要点
            每个状态对应一个Event对象,状态转换时触发相应事件。
        d.代码示例
            ---
            # Event状态机通信 - 工作流管理系统
            import threading
            import time
            import logging
            from typing import Dict, List, Callable, Optional
            from enum import Enum
            from dataclasses import dataclass

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class WorkflowState(Enum):
                """工作流状态枚举"""
                INIT = "初始化"
                LOADING = "加载中"
                PROCESSING = "处理中"
                VALIDATING = "验证中"
                COMPLETING = "完成中"
                COMPLETED = "已完成"
                FAILED = "失败"
                CANCELLED = "已取消"

            class WorkflowEvent(Enum):
                """工作流事件枚举"""
                START = "开始"
                LOAD_COMPLETE = "加载完成"
                PROCESS_COMPLETE = "处理完成"
                VALIDATE_SUCCESS = "验证成功"
                VALIDATE_FAILED = "验证失败"
                ERROR = "错误"
                CANCEL = "取消"

            @dataclass
            class WorkflowData:
                """工作流数据"""
                workflow_id: str
                input_data: str
                processed_data: Optional[str] = None
                validated_data: Optional[str] = None
                error_message: Optional[str] = None
                timestamp: float = 0

            class WorkflowStateMachine:
                """工作流状态机演示Event状态机通信"""
                def __init__(self, workflow_id: str):
                    self.workflow_id = workflow_id
                    self.current_state = WorkflowState.INIT
                    self.workflow_data = WorkflowData(
                        workflow_id=workflow_id,
                        input_data=f"输入数据-{workflow_id}"
                    )

                    # 状态事件 - 每个状态对应一个Event
                    self.state_events: Dict[WorkflowState, threading.Event] = {
                        state: threading.Event() for state in WorkflowState
                    }

                    # 事件队列
                    self.event_queue = []
                    self.event_queue_lock = threading.Lock()
                    self.event_queue_event = threading.Event()

                    # 状态转换表
                    self.state_transitions = self._define_state_transitions()

                    # 状态处理器
                    self.state_handlers = self._define_state_handlers()

                    # 控制标志
                    self.running = False
                    self.worker_thread: Optional[threading.Thread] = None

                def _define_state_transitions(self) -> Dict[WorkflowState, Dict[WorkflowEvent, WorkflowState]]:
                    """定义状态转换表"""
                    return {
                        WorkflowState.INIT: {
                            WorkflowEvent.START: WorkflowState.LOADING,
                            WorkflowEvent.CANCEL: WorkflowState.CANCELLED,
                            WorkflowEvent.ERROR: WorkflowState.FAILED
                        },
                        WorkflowState.LOADING: {
                            WorkflowEvent.LOAD_COMPLETE: WorkflowState.PROCESSING,
                            WorkflowEvent.ERROR: WorkflowState.FAILED,
                            WorkflowEvent.CANCEL: WorkflowState.CANCELLED
                        },
                        WorkflowState.PROCESSING: {
                            WorkflowEvent.PROCESS_COMPLETE: WorkflowState.VALIDATING,
                            WorkflowEvent.ERROR: WorkflowState.FAILED,
                            WorkflowEvent.CANCEL: WorkflowState.CANCELLED
                        },
                        WorkflowState.VALIDATING: {
                            WorkflowEvent.VALIDATE_SUCCESS: WorkflowState.COMPLETING,
                            WorkflowEvent.VALIDATE_FAILED: WorkflowState.PROCESSING,
                            WorkflowEvent.ERROR: WorkflowState.FAILED,
                            WorkflowEvent.CANCEL: WorkflowState.CANCELLED
                        },
                        WorkflowState.COMPLETING: {
                            WorkflowEvent.START: WorkflowState.COMPLETED,
                            WorkflowEvent.ERROR: WorkflowState.FAILED
                        },
                        WorkflowState.COMPLETED: {
                            # 终态,无转换
                        },
                        WorkflowState.FAILED: {
                            # 终态,无转换
                        },
                        WorkflowState.CANCELLED: {
                            # 终态,无转换
                        }
                    }

                def _define_state_handlers(self) -> Dict[WorkflowState, Callable]:
                    """定义状态处理器"""
                    return {
                        WorkflowState.INIT: self._handle_init_state,
                        WorkflowState.LOADING: self._handle_loading_state,
                        WorkflowState.PROCESSING: self._handle_processing_state,
                        WorkflowState.VALIDATING: self._handle_validating_state,
                        WorkflowState.COMPLETING: self._handle_completing_state,
                        WorkflowState.COMPLETED: self._handle_completed_state,
                        WorkflowState.FAILED: self._handle_failed_state,
                        WorkflowState.CANCELLED: self._handle_cancelled_state
                    }

                def start(self):
                    """启动状态机"""
                    logger.info(f"工作流{self.workflow_id}: 启动状态机")
                    self.running = True

                    # 启动工作线程
                    self.worker_thread = threading.Thread(
                        target=self._state_machine_worker,
                        name=f"Workflow-{self.workflow_id}"
                    )
                    self.worker_thread.start()

                    # 发送开始事件
                    self.send_event(WorkflowEvent.START)

                def send_event(self, event: WorkflowEvent, data: Optional[Dict] = None):
                    """发送事件到状态机"""
                    with self.event_queue_lock:
                        self.event_queue.append((event, data, time.time()))
                        logger.info(f"工作流{self.workflow_id}: 发送事件 {event.value}")
                        self.event_queue_event.set()

                def _state_machine_worker(self):
                    """状态机工作线程"""
                    logger.info(f"工作流{self.workflow_id}: 状态机线程启动")

                    while self.running:
                        # 等待事件
                        if self.event_queue_event.wait(timeout=1):
                            if not self.running:
                                break

                            # 获取事件
                            event, data, timestamp = self._get_next_event()
                            if event is None:
                                continue

                            logger.info(f"工作流{self.workflow_id}: 处理事件 {event.value}, "
                                       f"当前状态 {self.current_state.value}")

                            # 处理状态转换
                            self._process_state_transition(event, data)

                        # 执行当前状态的处理器
                        if self.running:
                            self._execute_current_state_handler()

                    logger.info(f"工作流{self.workflow_id}: 状态机线程停止")

                def _get_next_event(self) -> tuple:
                    """获取下一个事件"""
                    with self.event_queue_lock:
                        if self.event_queue:
                            return self.event_queue.pop(0)
                        else:
                            self.event_queue_event.clear()
                            return (None, None, None)

                def _process_state_transition(self, event: WorkflowEvent, data: Optional[Dict]):
                    """处理状态转换"""
                    # 检查当前状态是否允许该事件
                    if self.current_state in self.state_transitions:
                        transitions = self.state_transitions[self.current_state]
                        if event in transitions:
                            # 执行状态转换
                            old_state = self.current_state
                            self.current_state = transitions[event]
                            logger.info(f"工作流{self.workflow_id}: 状态转换 "
                                       f"{old_state.value} -> {self.current_state.value}")

                            # 触发状态事件
                            self.state_events[self.current_state].set()

                            # 处理事件数据
                            if data:
                                self._process_event_data(event, data)
                        else:
                            logger.warning(f"工作流{self.workflow_id}: 事件 {event.value} "
                                        f"在状态 {self.current_state.value} 下不允许")
                    else:
                        logger.warning(f"工作流{self.workflow_id}: 当前状态 {self.current_state.value} "
                                    f"没有定义转换规则")

                def _process_event_data(self, event: WorkflowEvent, data: Dict):
                    """处理事件数据"""
                    if event == WorkflowEvent.LOAD_COMPLETE:
                        self.workflow_data.processed_data = data.get('processed_data')
                    elif event == WorkflowEvent.VALIDATE_SUCCESS:
                        self.workflow_data.validated_data = data.get('validated_data')
                    elif event == WorkflowEvent.ERROR:
                        self.workflow_data.error_message = data.get('error_message')

                def _execute_current_state_handler(self):
                    """执行当前状态的处理器"""
                    if self.current_state in self.state_handlers:
                        handler = self.state_handlers[self.current_state]
                        try:
                            handler()
                        except Exception as e:
                            logger.error(f"工作流{self.workflow_id}: 状态处理器异常: {e}")
                            self.send_event(WorkflowEvent.ERROR, {'error_message': str(e)})

                def _handle_init_state(self):
                    """处理初始化状态"""
                    logger.info(f"工作流{self.workflow_id}: 初始化状态处理")
                    # 初始化逻辑在START事件中处理
                    time.sleep(0.1)

                def _handle_loading_state(self):
                    """处理加载状态"""
                    logger.info(f"工作流{self.workflow_id}: 加载数据...")
                    time.sleep(2)  # 模拟加载时间

                    # 模拟加载完成
                    processed_data = f"处理后的数据-{self.workflow_id}-{int(time.time())}"
                    self.send_event(WorkflowEvent.LOAD_COMPLETE, {'processed_data': processed_data})

                def _handle_processing_state(self):
                    """处理处理状态"""
                    logger.info(f"工作流{self.workflow_id}: 处理数据...")
                    time.sleep(1.5)  # 模拟处理时间

                    # 模拟处理完成
                    self.send_event(WorkflowEvent.PROCESS_COMPLETE)

                def _handle_validating_state(self):
                    """处理验证状态"""
                    logger.info(f"工作流{self.workflow_id}: 验证数据...")
                    time.sleep(1)  # 模拟验证时间

                    # 模拟验证结果 (90%成功率)
                    import random
                    if random.random() < 0.9:
                        validated_data = f"验证通过的数据-{self.workflow_id}"
                        self.send_event(WorkflowEvent.VALIDATE_SUCCESS, {'validated_data': validated_data})
                    else:
                        self.send_event(WorkflowEvent.VALIDATE_FAILED)

                def _handle_completing_state(self):
                    """处理完成状态"""
                    logger.info(f"工作流{self.workflow_id}: 完成工作流...")
                    time.sleep(0.5)
                    self.send_event(WorkflowEvent.START)  # 转到COMPLETED状态

                def _handle_completed_state(self):
                    """处理已完成状态"""
                    logger.info(f"工作流{self.workflow_id}: 工作流已完成")
                    logger.info(f"最终数据: 输入={self.workflow_data.input_data}, "
                               f"处理={self.workflow_data.processed_data}, "
                               f"验证={self.workflow_data.validated_data}")
                    self.running = False

                def _handle_failed_state(self):
                    """处理失败状态"""
                    error_msg = self.workflow_data.error_message or "未知错误"
                    logger.error(f"工作流{self.workflow_id}: 工作流失败 - {error_msg}")
                    self.running = False

                def _handle_cancelled_state(self):
                    """处理取消状态"""
                    logger.info(f"工作流{self.workflow_id}: 工作流已取消")
                    self.running = False

                def wait_for_state(self, state: WorkflowState, timeout: Optional[float] = None) -> bool:
                    """等待特定状态"""
                    return self.state_events[state].wait(timeout=timeout)

                def cancel(self):
                    """取消工作流"""
                    self.send_event(WorkflowEvent.CANCEL)

                def stop(self):
                    """停止状态机"""
                    self.running = False
                    self.event_queue_event.set()

                    if self.worker_thread:
                        self.worker_thread.join(timeout=2)
                        if self.worker_thread.is_alive():
                            logger.warning(f"工作流{self.workflow_id}: 状态机线程未能及时停止")

            class WorkflowManager:
                """工作流管理器"""
                def __init__(self):
                    self.workflows: Dict[str, WorkflowStateMachine] = {}

                def create_workflow(self, workflow_id: str) -> WorkflowStateMachine:
                    """创建工作流"""
                    workflow = WorkflowStateMachine(workflow_id)
                    self.workflows[workflow_id] = workflow
                    return workflow

                def start_workflow(self, workflow_id: str):
                    """启动工作流"""
                    if workflow_id in self.workflows:
                        self.workflows[workflow_id].start()

                def cancel_workflow(self, workflow_id: str):
                    """取消工作流"""
                    if workflow_id in self.workflows:
                        self.workflows[workflow_id].cancel()

                def wait_for_workflow(self, workflow_id: str, state: WorkflowState, timeout: float = 30) -> bool:
                    """等待工作流达到特定状态"""
                    if workflow_id in self.workflows:
                        return self.workflows[workflow_id].wait_for_state(state, timeout)
                    return False

                def run_demo(self):
                    """运行演示"""
                    logger.info("=== 工作流状态机演示 ===")

                    # 创建多个工作流
                    workflows = []
                    for i in range(3):
                        workflow_id = f"WF-{i+1:03d}"
                        workflow = self.create_workflow(workflow_id)
                        workflows.append(workflow)
                        self.start_workflow(workflow_id)

                    # 等待所有工作流完成
                    for workflow in workflows:
                        if workflow.worker_thread:
                            workflow.worker_thread.join()

                    logger.info("所有工作流处理完成")

            # 使用示例
            if __name__ == "__main__":
                manager = WorkflowManager()
                manager.run_demo()
            ---

5.4 事件驱动模式

01.事件驱动架构基础
    a.定义与原理
        事件驱动架构是一种软件架构模式,系统的行为由事件的发生和处理来驱动。
        组件之间通过事件进行松耦合的通信,事件处理器响应特定类型的事件。
    b.核心组件
        a.事件生产者
            生成并发布事件到系统中的组件。
        b.事件消费者
            订阅并处理特定类型事件的组件。
        c.事件总线
            负责事件的分发和路由的中心组件。
        d.事件存储
            持久化事件历史,支持重放和查询。
    c.Event在事件驱动中的作用
        a.事件通知机制
            作为事件发生的基本通知原语。
        b.同步协调工具
            协调生产者和消费者之间的同步。
        c.状态变更信号
            表示系统状态发生了变化。
        d.异步处理触发器
            触发异步处理流程。

02.基于Event的事件驱动框架
    a.框架设计原则
        a.松耦合
            生产者和消费者之间不直接依赖。
        b.异步处理
            支持异步事件处理提高性能。
        c.可扩展性
            易于添加新的事件类型和处理器。
        d.可靠性
            确保事件不丢失和处理顺序正确。
    b.框架核心实现
        a.事件注册机制
            支持事件的注册和注销。
        b.事件分发器
            高效的事件分发和路由。
        c.处理器管理
            动态管理和调度事件处理器。
    c.代码示例
        ---
        # Event事件驱动框架 - 轻量级事件系统
        import threading
        import time
        import logging
        import weakref
        from typing import Dict, List, Callable, Any, Optional, Type
        from dataclasses import dataclass
        from enum import Enum
        from abc import ABC, abstractmethod
        import queue
        import uuid

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class EventPriority(Enum):
            """事件优先级"""
            LOW = 1
            NORMAL = 2
            HIGH = 3
            URGENT = 4

        @dataclass
        class Event:
            """基础事件类"""
            event_id: str
            event_type: str
            source: str
            timestamp: float
            data: Any
            priority: EventPriority = EventPriority.NORMAL

        class EventHandler(ABC):
            """事件处理器抽象基类"""
            def __init__(self, handler_name: str):
                self.handler_name = handler_name
                self.is_active = True

            @abstractmethod
            def handle_event(self, event: Event) -> None:
                """处理事件"""
                pass

            def can_handle(self, event_type: str) -> bool:
                """检查是否能处理特定类型的事件"""
                return True

            def __str__(self):
                return f"EventHandler({self.handler_name})"

        class EventBus:
            """事件总线 - 核心事件分发器"""
            def __init__(self):
                # 事件处理器注册表 {event_type: [handlers]}
                self.handlers: Dict[str, List[weakref.ref]] = {}

                # 优先级队列用于事件分发
                self.event_queue = queue.PriorityQueue()

                # 分发器线程
                self.dispatcher_thread = None
                self.is_running = False

                # 统计信息
                self.stats = {
                    'events_published': 0,
                    'events_processed': 0,
                    'events_failed': 0,
                    'handlers_registered': 0
                }

                # 控制事件
                self.stop_event = threading.Event()

            def start(self):
                """启动事件总线"""
                if self.is_running:
                    logger.warning("事件总线已经在运行中")
                    return

                self.is_running = True
                self.dispatcher_thread = threading.Thread(
                    target=self._dispatch_loop,
                    name="EventBus-Dispatcher"
                )
                self.dispatcher_thread.start()
                logger.info("事件总线已启动")

            def stop(self):
                """停止事件总线"""
                if not self.is_running:
                    return

                logger.info("正在停止事件总线...")
                self.is_running = False
                self.stop_event.set()

                if self.dispatcher_thread:
                    self.dispatcher_thread.join(timeout=5)
                    if self.dispatcher_thread.is_alive():
                        logger.warning("事件分发器未能在超时内停止")

                logger.info("事件总线已停止")

            def register_handler(self, event_type: str, handler: EventHandler):
                """注册事件处理器"""
                if event_type not in self.handlers:
                    self.handlers[event_type] = []

                # 使用弱引用避免循环引用
                handler_ref = weakref.ref(handler)
                self.handlers[event_type].append(handler_ref)
                self.stats['handlers_registered'] += 1

                logger.info(f"注册处理器 {handler} 处理事件类型 {event_type}")

            def unregister_handler(self, event_type: str, handler: EventHandler):
                """注销事件处理器"""
                if event_type in self.handlers:
                    self.handlers[event_type] = [
                        ref for ref in self.handlers[event_type]
                        if ref() is not None and ref() != handler
                    ]

                    # 如果处理器列表为空,删除该事件类型
                    if not self.handlers[event_type]:
                        del self.handlers[event_type]

                    logger.info(f"注销处理器 {handler} 事件类型 {event_type}")

            def publish(self, event_type: str, source: str, data: Any,
                       priority: EventPriority = EventPriority.NORMAL) -> str:
                """发布事件"""
                event = Event(
                    event_id=str(uuid.uuid4()),
                    event_type=event_type,
                    source=source,
                    timestamp=time.time(),
                    data=data,
                    priority=priority
                )

                # 将事件加入优先级队列
                # 使用负优先级因为PriorityQueue是最小堆
                self.event_queue.put((-priority.value, event))
                self.stats['events_published'] += 1

                logger.debug(f"发布事件 {event.event_id} 类型 {event_type} "
                           f"优先级 {priority.name}")
                return event.event_id

            def _dispatch_loop(self):
                """事件分发循环"""
                logger.info("事件分发器启动")

                while self.is_running and not self.stop_event.is_set():
                    try:
                        # 从队列获取事件,设置超时避免无限阻塞
                        try:
                            priority, event = self.event_queue.get(timeout=1)
                        except queue.Empty:
                            continue

                        # 分发事件
                        self._dispatch_event(event)

                        # 标记任务完成
                        self.event_queue.task_done()

                    except Exception as e:
                        logger.error(f"事件分发循环异常: {e}")
                        self.stats['events_failed'] += 1

                logger.info("事件分发器停止")

            def _dispatch_event(self, event: Event):
                """分发单个事件"""
                logger.debug(f"分发事件 {event.event_id} 类型 {event.event_type}")

                # 获取事件类型的处理器
                handlers = self._get_active_handlers(event.event_type)
                if not handlers:
                    logger.warning(f"事件 {event.event_type} 没有注册的处理器")
                    return

                # 并行或串行处理事件
                self._process_event_with_handlers(event, handlers)

            def _get_active_handlers(self, event_type: str) -> List[EventHandler]:
                """获取活跃的事件处理器"""
                active_handlers = []

                if event_type in self.handlers:
                    # 清理失效的弱引用
                    valid_refs = []
                    for handler_ref in self.handlers[event_type]:
                        handler = handler_ref()
                        if handler is not None and handler.is_active:
                            valid_refs.append(handler_ref)
                            active_handlers.append(handler)

                    # 更新处理器列表
                    self.handlers[event_type] = valid_refs

                return active_handlers

            def _process_event_with_handlers(self, event: Event, handlers: List[EventHandler]):
                """使用处理器处理事件"""
                successful_handlers = 0
                failed_handlers = 0

                for handler in handlers:
                    if not handler.can_handle(event.event_type):
                        continue

                    try:
                        handler.handle_event(event)
                        successful_handlers += 1
                        logger.debug(f"处理器 {handler} 成功处理事件 {event.event_id}")

                    except Exception as e:
                        failed_handlers += 1
                        logger.error(f"处理器 {handler} 处理事件 {event.event_id} 失败: {e}")

                logger.debug(f"事件 {event.event_id} 处理完成: "
                           f"成功 {successful_handlers}, 失败 {failed_handlers}")

                if failed_handlers == 0:
                    self.stats['events_processed'] += 1
                else:
                    self.stats['events_failed'] += 1

            def get_stats(self) -> Dict[str, int]:
                """获取统计信息"""
                return self.stats.copy()

        # 具体事件类型
        class SystemEvent:
            """系统事件类型"""
            STARTUP = "system.startup"
            SHUTDOWN = "system.shutdown"
            ERROR = "system.error"
            RESOURCE_WARNING = "system.resource_warning"

        class DataEvent:
            """数据事件类型"""
            DATA_RECEIVED = "data.received"
            DATA_PROCESSED = "data.processed"
            DATA_VALIDATED = "data.validated"
            DATA_ERROR = "data.error"

        class UserEvent:
            """用户事件类型"""
            USER_LOGIN = "user.login"
            USER_LOGOUT = "user.logout"
            USER_ACTION = "user.action"

        # 具体事件处理器
        class SystemEventHandler(EventHandler):
            """系统事件处理器"""
            def __init__(self):
                super().__init__("SystemEventHandler")

            def handle_event(self, event: Event) -> None:
                if event.event_type == SystemEvent.STARTUP:
                    self._handle_startup(event)
                elif event.event_type == SystemEvent.SHUTDOWN:
                    self._handle_shutdown(event)
                elif event.event_type == SystemEvent.ERROR:
                    self._handle_error(event)
                elif event.event_type == SystemEvent.RESOURCE_WARNING:
                    self._handle_resource_warning(event)

            def _handle_startup(self, event: Event):
                logger.info(f"系统启动: {event.data}")
                time.sleep(0.5)  # 模拟启动过程

            def _handle_shutdown(self, event: Event):
                logger.info(f"系统关闭: {event.data}")
                time.sleep(0.3)  # 模拟关闭过程

            def _handle_error(self, event: Event):
                logger.error(f"系统错误: {event.data}")

            def _handle_resource_warning(self, event: Event):
                logger.warning(f"资源警告: {event.data}")

        class DataEventHandler(EventHandler):
            """数据事件处理器"""
            def __init__(self):
                super().__init__("DataEventHandler")
                self.processed_data = []

            def handle_event(self, event: Event) -> None:
                if event.event_type == DataEvent.DATA_RECEIVED:
                    self._handle_data_received(event)
                elif event.event_type == DataEvent.DATA_PROCESSED:
                    self._handle_data_processed(event)
                elif event.event_type == DataEvent.DATA_VALIDATED:
                    self._handle_data_validated(event)
                elif event.event_type == DataEvent.DATA_ERROR:
                    self._handle_data_error(event)

            def _handle_data_received(self, event: Event):
                logger.info(f"接收数据: {event.data}")
                # 模拟数据接收后处理
                time.sleep(0.2)

            def _handle_data_processed(self, event: Event):
                logger.info(f"处理数据: {event.data}")
                self.processed_data.append(event.data)

            def _handle_data_validated(self, event: Event):
                logger.info(f"验证数据: {event.data}")

            def _handle_data_error(self, event: Event):
                logger.error(f"数据错误: {event.data}")

        class UserEventHandler(EventHandler):
            """用户事件处理器"""
            def __init__(self):
                super().__init__("UserEventHandler")
                self.active_users = set()

            def handle_event(self, event: Event) -> None:
                if event.event_type == UserEvent.USER_LOGIN:
                    self._handle_user_login(event)
                elif event.event_type == UserEvent.USER_LOGOUT:
                    self._handle_user_logout(event)
                elif event.event_type == UserEvent.USER_ACTION:
                    self._handle_user_action(event)

            def _handle_user_login(self, event: Event):
                user_id = event.data.get('user_id')
                self.active_users.add(user_id)
                logger.info(f"用户登录: {user_id}, 当前活跃用户数: {len(self.active_users)}")

            def _handle_user_logout(self, event: Event):
                user_id = event.data.get('user_id')
                self.active_users.discard(user_id)
                logger.info(f"用户登出: {user_id}, 当前活跃用户数: {len(self.active_users)}")

            def _handle_user_action(self, event: Event):
                user_id = event.data.get('user_id')
                action = event.data.get('action')
                logger.info(f"用户操作: {user_id} 执行了 {action}")

        class EventDrivenFrameworkDemo:
            """事件驱动框架演示"""
            def __init__(self):
                self.event_bus = EventBus()
                self.handlers = []

            def setup(self):
                """设置事件处理器"""
                # 创建事件处理器
                system_handler = SystemEventHandler()
                data_handler = DataEventHandler()
                user_handler = UserEventHandler()

                # 注册处理器
                self.event_bus.register_handler(SystemEvent.STARTUP, system_handler)
                self.event_bus.register_handler(SystemEvent.SHUTDOWN, system_handler)
                self.event_bus.register_handler(SystemEvent.ERROR, system_handler)
                self.event_bus.register_handler(SystemEvent.RESOURCE_WARNING, system_handler)

                self.event_bus.register_handler(DataEvent.DATA_RECEIVED, data_handler)
                self.event_bus.register_handler(DataEvent.DATA_PROCESSED, data_handler)
                self.event_bus.register_handler(DataEvent.DATA_VALIDATED, data_handler)
                self.event_bus.register_handler(DataEvent.DATA_ERROR, data_handler)

                self.event_bus.register_handler(UserEvent.USER_LOGIN, user_handler)
                self.event_bus.register_handler(UserEvent.USER_LOGOUT, user_handler)
                self.event_bus.register_handler(UserEvent.USER_ACTION, user_handler)

                self.handlers = [system_handler, data_handler, user_handler]

                logger.info("事件处理器设置完成")

            def demo_system_events(self):
                """演示系统事件"""
                logger.info("=== 演示系统事件 ===")

                # 系统启动事件
                self.event_bus.publish(
                    SystemEvent.STARTUP,
                    "EventFramework",
                    {"version": "1.0", "components": ["EventBus", "Handlers"]},
                    EventPriority.HIGH
                )

                time.sleep(1)

                # 资源警告事件
                self.event_bus.publish(
                    SystemEvent.RESOURCE_WARNING,
                    "ResourceManager",
                    {"resource": "memory", "usage": "85%"},
                    EventPriority.NORMAL
                )

                time.sleep(0.5)

                # 系统错误事件
                self.event_bus.publish(
                    SystemEvent.ERROR,
                    "DatabaseService",
                    {"error": "Connection timeout", "retry_count": 3},
                    EventPriority.URGENT
                )

                time.sleep(0.5)

                # 系统关闭事件
                self.event_bus.publish(
                    SystemEvent.SHUTDOWN,
                    "EventFramework",
                    {"reason": "Demo completed"},
                    EventPriority.HIGH
                )

            def demo_data_events(self):
                """演示数据事件"""
                logger.info("=== 演示数据事件 ===")

                # 数据接收事件
                for i in range(3):
                    data_id = f"data_{i+1}"
                    self.event_bus.publish(
                        DataEvent.DATA_RECEIVED,
                        "DataService",
                        {"data_id": data_id, "size": 1024 * (i+1)},
                        EventPriority.NORMAL
                    )
                    time.sleep(0.3)

                    # 数据处理事件
                    self.event_bus.publish(
                        DataEvent.DATA_PROCESSED,
                        "DataService",
                        {"data_id": data_id, "result": "success"},
                        EventPriority.NORMAL
                    )

                    # 数据验证事件
                    self.event_bus.publish(
                        DataEvent.DATA_VALIDATED,
                        "DataService",
                        {"data_id": data_id, "valid": True},
                        EventPriority.NORMAL
                    )

                # 模拟数据错误
                self.event_bus.publish(
                    DataEvent.DATA_ERROR,
                    "DataService",
                    {"data_id": "corrupt_data", "error": "Invalid format"},
                    EventPriority.NORMAL
                )

            def demo_user_events(self):
                """演示用户事件"""
                logger.info("=== 演示用户事件 ===")

                users = ["alice", "bob", "charlie"]
                actions = ["view_page", "click_button", "submit_form"]

                # 用户登录
                for user in users:
                    self.event_bus.publish(
                        UserEvent.USER_LOGIN,
                        "AuthService",
                        {"user_id": user, "timestamp": time.time()},
                        EventPriority.NORMAL
                    )
                    time.sleep(0.2)

                # 用户操作
                for i, user in enumerate(users):
                    for j, action in enumerate(actions):
                        if (i + j) % 2 == 0:  # 随机选择一些操作
                            self.event_bus.publish(
                                UserEvent.USER_ACTION,
                                "UserActivityTracker",
                                {"user_id": user, "action": action, "timestamp": time.time()},
                                EventPriority.NORMAL
                            )
                            time.sleep(0.1)

                # 用户登出
                for user in users:
                    self.event_bus.publish(
                        UserEvent.USER_LOGOUT,
                        "AuthService",
                        {"user_id": user, "session_duration": 300},
                        EventPriority.NORMAL
                    )
                    time.sleep(0.2)

            def run_demo(self):
                """运行完整演示"""
                logger.info("=== 事件驱动框架演示开始 ===")

                # 启动事件总线
                self.event_bus.start()

                # 设置事件处理器
                self.setup()

                try:
                    # 演示不同类型的事件
                    self.demo_system_events()
                    time.sleep(1)

                    self.demo_data_events()
                    time.sleep(1)

                    self.demo_user_events()
                    time.sleep(1)

                    # 等待所有事件处理完成
                    time.sleep(2)

                finally:
                    # 停止事件总线
                    self.event_bus.stop()

                # 打印统计信息
                stats = self.event_bus.get_stats()
                    logger.info(f"\n=== 事件统计 ===")
                    logger.info(f"发布事件数: {stats['events_published']}")
                    logger.info(f"处理事件数: {stats['events_processed']}")
                    logger.info(f"失败事件数: {stats['events_failed']}")
                    logger.info(f"注册处理器数: {stats['handlers_registered']}")

                    logger.info("=== 事件驱动框架演示完成 ===")

            # 使用示例
            if __name__ == "__main__":
                demo = EventDrivenFrameworkDemo()
                demo.run_demo()
            ---

03.Event驱动的高级应用模式
    a.发布-订阅模式
        a.模式描述
            生产者发布事件,消费者订阅感兴趣的事件类型。
        b.实现要点
            使用Event对象实现异步通知,支持动态订阅。
        c.代码示例
            ---
            # Event发布-订阅模式 - 消息系统
            import threading
            import time
            import logging
            from typing import Dict, List, Set, Callable, Any
            from dataclasses import dataclass
            from enum import Enum
            import json

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class MessageType(Enum):
                """消息类型枚举"""
                NEWS = "news"
                WEATHER = "weather"
                SPORTS = "sports"
                FINANCE = "finance"
                ALERT = "alert"

            @dataclass
            class Message:
                """消息数据类"""
                message_id: str
                message_type: MessageType
                content: str
                priority: int
                timestamp: float
                metadata: Dict[str, Any]

            class Publisher:
                """消息发布者"""
                def __init__(self, name: str, message_bus: 'MessageBus'):
                    self.name = name
                    self.message_bus = message_bus

                def publish(self, message_type: MessageType, content: str,
                          priority: int = 1, metadata: Dict[str, Any] = None):
                    """发布消息"""
                    message = Message(
                        message_id=f"{self.name}_{int(time.time()*1000)}",
                        message_type=message_type,
                        content=content,
                        priority=priority,
                        timestamp=time.time(),
                        metadata=metadata or {}
                    )

                    self.message_bus.publish(message)
                    logger.info(f"发布者 {self.name}: 发布 {message_type.value} 消息")

            class Subscriber:
                """消息订阅者"""
                def __init__(self, name: str, message_bus: 'MessageBus'):
                    self.name = name
                    self.message_bus = message_bus
                    self.subscribed_types: Set[MessageType] = set()
                    self.received_messages: List[Message] = []

                def subscribe(self, message_type: MessageType):
                    """订阅消息类型"""
                    self.subscribed_types.add(message_type)
                    self.message_bus.subscribe(message_type, self)
                    logger.info(f"订阅者 {self.name}: 订阅 {message_type.value} 消息")

                def unsubscribe(self, message_type: MessageType):
                    """取消订阅消息类型"""
                    self.subscribed_types.discard(message_type)
                    self.message_bus.unsubscribe(message_type, self)
                    logger.info(f"订阅者 {self.name}: 取消订阅 {message_type.value} 消息")

                def receive_message(self, message: Message):
                    """接收消息"""
                    self.received_messages.append(message)
                    logger.info(f"订阅者 {self.name}: 收到 {message.message_type.value} 消息 - {message.content}")

                def get_message_count(self, message_type: MessageType = None) -> int:
                    """获取消息数量"""
                    if message_type is None:
                        return len(self.received_messages)
                    else:
                        return len([msg for msg in self.received_messages
                                 if msg.message_type == message_type])

            class MessageBus:
                """消息总线 - 实现发布-订阅模式"""
                def __init__(self):
                    # 订阅者注册表 {message_type: [subscribers]}
                    self.subscribers: Dict[MessageType, List[Subscriber]] = {}

                    # 消息队列
                    self.message_queue = []
                    self.queue_lock = threading.Lock()
                    self.queue_event = threading.Event()

                    # 分发器线程
                    self.dispatcher_thread = None
                    self.is_running = False

                def start(self):
                    """启动消息总线"""
                    if self.is_running:
                        logger.warning("消息总线已经在运行")
                        return

                    self.is_running = True
                    self.dispatcher_thread = threading.Thread(
                        target=self._dispatch_loop,
                        name="MessageBus-Dispatcher"
                    )
                    self.dispatcher_thread.start()
                    logger.info("消息总线已启动")

                def stop(self):
                    """停止消息总线"""
                    if not self.is_running:
                        return

                    self.is_running = False
                    self.queue_event.set()

                    if self.dispatcher_thread:
                        self.dispatcher_thread.join(timeout=3)
                        if self.dispatcher_thread.is_alive():
                            logger.warning("消息分发器未能及时停止")

                    logger.info("消息总线已停止")

                def subscribe(self, message_type: MessageType, subscriber: Subscriber):
                    """订阅消息类型"""
                    with self.queue_lock:
                        if message_type not in self.subscribers:
                            self.subscribers[message_type] = []
                        self.subscribers[message_type].append(subscriber)

                def unsubscribe(self, message_type: MessageType, subscriber: Subscriber):
                    """取消订阅消息类型"""
                    with self.queue_lock:
                        if message_type in self.subscribers:
                            self.subscribers[message_type] = [
                                sub for sub in self.subscribers[message_type]
                                if sub != subscriber
                            ]
                            # 如果没有订阅者了,删除该类型
                            if not self.subscribers[message_type]:
                                del self.subscribers[message_type]

                def publish(self, message: Message):
                    """发布消息"""
                    with self.queue_lock:
                        self.message_queue.append(message)
                        self.queue_event.set()
                    logger.debug(f"消息已加入队列: {message.message_type.value}")

                def _dispatch_loop(self):
                    """消息分发循环"""
                    logger.info("消息分发器启动")

                    while self.is_running:
                        # 等待消息
                        if self.queue_event.wait(timeout=1):
                            if not self.is_running:
                                break

                            # 获取消息
                            messages_to_dispatch = []
                            with self.queue_lock:
                                if self.message_queue:
                                    messages_to_dispatch = self.message_queue.copy()
                                    self.message_queue.clear()
                                    self.queue_event.clear()

                            # 分发消息
                            for message in messages_to_dispatch:
                                self._dispatch_message(message)

                    logger.info("消息分发器停止")

                def _dispatch_message(self, message: Message):
                    """分发单个消息"""
                    logger.debug(f"分发消息: {message.message_type.value}")

                    with self.queue_lock:
                        subscribers = self.subscribers.get(message.message_type, [])

                    # 将消息发送给所有订阅者
                    for subscriber in subscribers:
                        try:
                            subscriber.receive_message(message)
                        except Exception as e:
                            logger.error(f"向订阅者 {subscriber.name} 发送消息失败: {e}")

                def get_subscriber_count(self, message_type: MessageType) -> int:
                    """获取订阅者数量"""
                    with self.queue_lock:
                        return len(self.subscribers.get(message_type, []))

            class PubSubDemo:
                """发布-订阅模式演示"""
                def __init__(self):
                    self.message_bus = MessageBus()
                    self.publishers = []
                    self.subscribers = []

                def create_publishers(self):
                    """创建发布者"""
                    # 新闻发布者
                    news_publisher = Publisher("NewsAgency", self.message_bus)
                    self.publishers.append(news_publisher)

                    # 天气发布者
                    weather_publisher = Publisher("WeatherService", self.message_bus)
                    self.publishers.append(weather_publisher)

                    # 体育发布者
                    sports_publisher = Publisher("SportsNews", self.message_bus)
                    self.publishers.append(sports_publisher)

                    # 金融发布者
                    finance_publisher = Publisher("FinanceAPI", self.message_bus)
                    self.publishers.append(finance_publisher)

                    return self.publishers

                def create_subscribers(self):
                    """创建订阅者"""
                    # 综合新闻订阅者
                    general_subscriber = Subscriber("GeneralReader", self.message_bus)
                    general_subscriber.subscribe(MessageType.NEWS)
                    general_subscriber.subscribe(MessageType.ALERT)
                    self.subscribers.append(general_subscriber)

                    # 天气关注者
                    weather_subscriber = Subscriber("WeatherWatcher", self.message_bus)
                    weather_subscriber.subscribe(MessageType.WEATHER)
                    weather_subscriber.subscribe(MessageType.ALERT)
                    self.subscribers.append(weather_subscriber)

                    # 体育迷
                    sports_subscriber = Subscriber("SportsFan", self.message_bus)
                    sports_subscriber.subscribe(MessageType.SPORTS)
                    sports_subscriber.subscribe(MessageType.NEWS)
                    self.subscribers.append(sports_subscriber)

                    # 投资者
                    finance_subscriber = Subscriber("Investor", self.message_bus)
                    finance_subscriber.subscribe(MessageType.FINANCE)
                    finance_subscriber.subscribe(MessageType.ALERT)
                    self.subscribers.append(finance_subscriber)

                    return self.subscribers

                def simulate_news_publishing(self):
                    """模拟新闻发布"""
                    logger.info("=== 模拟新闻发布 ===")

                    news_publisher = self.publishers[0]
                    news_items = [
                        "政府发布新政策",
                        "科技巨头发布新产品",
                        "国际局势最新进展",
                        "本地社区活动报道"
                    ]

                    for i, news in enumerate(news_items):
                        news_publisher.publish(MessageType.NEWS, news, priority=i+1)
                        time.sleep(1)

                def simulate_weather_updates(self):
                    """模拟天气更新"""
                    logger.info("=== 模拟天气更新 ===")

                    weather_publisher = self.publishers[1]
                    weather_updates = [
                        "今日晴天,最高温度28°C",
                        "明日可能有阵雨",
                        "空气质量指数良好",
                        "周末天气转晴"
                    ]

                    for weather in weather_updates:
                        weather_publisher.publish(MessageType.WEATHER, weather)
                        time.sleep(0.8)

                def simulate_sports_news(self):
                    """模拟体育新闻"""
                    logger.info("=== 模拟体育新闻 ===")

                    sports_publisher = self.publishers[2]
                    sports_news = [
                        "国家队赢得重要比赛",
                        "联赛最新积分榜",
                        "球员转会消息",
                        "体育赛事预告"
                    ]

                    for news in sports_news:
                        sports_publisher.publish(MessageType.SPORTS, news, priority=2)
                        time.sleep(1.2)

                def simulate_finance_updates(self):
                    """模拟金融更新"""
                    logger.info("=== 模拟金融更新 ===")

                    finance_publisher = self.publishers[3]
                    finance_updates = [
                        "股市收盘上涨2.5%",
                        "央行发布利率决议",
                        "大宗商品价格波动",
                        "汇率最新变化"
                    ]

                    for update in finance_updates:
                        finance_publisher.publish(MessageType.FINANCE, update, priority=3)
                        time.sleep(0.7)

                def simulate_alerts(self):
                    """模拟紧急警报"""
                    logger.info("=== 模拟紧急警报 ===")

                    # 使用不同的发布者发送警报
                    for i, publisher in enumerate(self.publishers):
                        alert_messages = [
                            "系统维护通知",
                            "紧急新闻快讯",
                            "天气预警",
                            "市场波动警报"
                        ]
                        if i < len(alert_messages):
                            publisher.publish(MessageType.ALERT, alert_messages[i], priority=5)
                        time.sleep(0.5)

                def demonstrate_dynamic_subscription(self):
                    """演示动态订阅"""
                    logger.info("=== 演示动态订阅 ===")

                    # 创建新订阅者
                    temp_subscriber = Subscriber("TempReader", self.message_bus)
                    temp_subscriber.subscribe(MessageType.NEWS)
                    self.subscribers.append(temp_subscriber)

                    # 发布一些新闻
                    self.publishers[0].publish(MessageType.NEWS, "突发新闻:新订阅者加入")
                    time.sleep(1)

                    # 订阅者取消订阅
                    temp_subscriber.unsubscribe(MessageType.NEWS)
                    time.sleep(0.5)

                    # 再次发布新闻
                    self.publishers[0].publish(MessageType.NEWS, "后续新闻更新")

                    # 检查订阅状态
                    logger.info(f"临时订阅者收到消息数: {temp_subscriber.get_message_count()}")

                def run_demo(self):
                    """运行完整演示"""
                    logger.info("=== 发布-订阅模式演示开始 ===")

                    # 启动消息总线
                    self.message_bus.start()

                    # 创建发布者和订阅者
                    self.create_publishers()
                    self.create_subscribers()

                    try:
                        # 模拟各种消息发布
                        self.simulate_news_publishing()
                        time.sleep(2)

                        self.simulate_weather_updates()
                        time.sleep(2)

                        self.simulate_sports_news()
                        time.sleep(2)

                        self.simulate_finance_updates()
                        time.sleep(2)

                        self.simulate_alerts()
                        time.sleep(2)

                        # 演示动态订阅
                        self.demonstrate_dynamic_subscription()

                        # 等待所有消息处理完成
                        time.sleep(3)

                    finally:
                        # 停止消息总线
                        self.message_bus.stop()

                    # 打印统计信息
                    self._print_statistics()

                    logger.info("=== 发布-订阅模式演示完成 ===")

                def _print_statistics(self):
                    """打印统计信息"""
                    logger.info("\n=== 订阅统计 ===")

                    for subscriber in self.subscribers:
                        logger.info(f"订阅者 {subscriber.name}:")
                        for msg_type in MessageType:
                            count = subscriber.get_message_count(msg_type)
                            if count > 0:
                                logger.info(f"  {msg_type.value}: {count} 条消息")

                        total = subscriber.get_message_count()
                        logger.info(f"  总计: {total} 条消息\n")

            # 使用示例
            if __name__ == "__main__":
                demo = PubSubDemo()
                demo.run_demo()
            ---

        b.观察者模式
            a.模式描述
            观察者订阅主题,主题状态变化时通知所有观察者。
            b.实现要点
            使用Event对象实现观察者通知机制。
            c.代码示例
            ---
            # Event观察者模式 - 状态监控系统
            import threading
            import time
            import logging
            from typing import List, Dict, Any, Callable
            from dataclasses import dataclass
            from enum import Enum
            from abc import ABC, abstractmethod

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class ChangeType(Enum):
                """变化类型"""
                CREATED = "created"
                UPDATED = "updated"
                DELETED = "deleted"
                ERROR = "error"

            @dataclass
            class ChangeEvent:
                """变化事件"""
                subject_id: str
                change_type: ChangeType
                old_value: Any
                new_value: Any
                timestamp: float
                metadata: Dict[str, Any]

            class Observer(ABC):
                """观察者抽象基类"""
                def __init__(self, name: str):
                    self.name = name
                    self.is_active = True

                @abstractmethod
                def update(self, change_event: ChangeEvent) -> None:
                    """接收变化通知"""
                    pass

                def activate(self):
                    """激活观察者"""
                    self.is_active = True
                    logger.info(f"观察者 {self.name} 已激活")

                def deactivate(self):
                    """停用观察者"""
                    self.is_active = False
                    logger.info(f"观察者 {self.name} 已停用")

            class Subject:
                """被观察者(主题)"""
                def __init__(self, subject_id: str):
                    self.subject_id = subject_id
                    self.observers: List[Observer] = []
                    self.state = {}
                    self.change_event = threading.Event()
                    self.lock = threading.Lock()

                def attach(self, observer: Observer):
                    """添加观察者"""
                    with self.lock:
                        if observer not in self.observers:
                            self.observers.append(observer)
                            logger.info(f"主题 {self.subject_id}: 添加观察者 {observer.name}")

                def detach(self, observer: Observer):
                    """移除观察者"""
                    with self.lock:
                        if observer in self.observers:
                            self.observers.remove(observer)
                            logger.info(f"主题 {self.subject_id}: 移除观察者 {observer.name}")

                def notify_observers(self, change_event: ChangeEvent):
                    """通知所有观察者"""
                    logger.debug(f"主题 {self.subject_id}: 通知观察者状态变化")

                    for observer in self.observers:
                        if observer.is_active:
                            try:
                                observer.update(change_event)
                            except Exception as e:
                                logger.error(f"通知观察者 {observer.name} 失败: {e}")

                def set_state(self, key: str, value: Any, metadata: Dict[str, Any] = None):
                    """设置状态并通知观察者"""
                    with self.lock:
                        old_value = self.state.get(key)
                        self.state[key] = value

                        # 创建变化事件
                        if old_value is None:
                            change_type = ChangeType.CREATED
                        else:
                            change_type = ChangeType.UPDATED

                        change_event = ChangeEvent(
                            subject_id=self.subject_id,
                            change_type=change_type,
                            old_value=old_value,
                            new_value=value,
                            timestamp=time.time(),
                            metadata=metadata or {}
                        )

                    # 通知观察者
                    self.notify_observers(change_event)

                    # 触发变化事件
                    self.change_event.set()

                def delete_state(self, key: str, metadata: Dict[str, Any] = None):
                    """删除状态并通知观察者"""
                    with self.lock:
                        old_value = self.state.get(key)
                        if key in self.state:
                            del self.state[key]

                            change_event = ChangeEvent(
                                subject_id=self.subject_id,
                                change_type=ChangeType.DELETED,
                                old_value=old_value,
                                new_value=None,
                                timestamp=time.time(),
                                metadata=metadata or {}
                            )

                            # 通知观察者
                            self.notify_observers(change_event)

                            # 触发变化事件
                            self.change_event.set()

                def get_state(self, key: str) -> Any:
                    """获取状态"""
                    with self.lock:
                        return self.state.get(key)

                def get_all_state(self) -> Dict[str, Any]:
                    """获取所有状态"""
                    with self.lock:
                        return self.state.copy()

                def wait_for_change(self, timeout: float = None) -> bool:
                    """等待状态变化"""
                    return self.change_event.wait(timeout=timeout)

                def clear_change_event(self):
                    """清除变化事件"""
                    self.change_event.clear()

            # 具体观察者实现
            class LoggingObserver(Observer):
                """日志观察者"""
                def __init__(self, name: str = "LoggingObserver"):
                    super().__init__(name)
                    self.log_entries: List[str] = []

                def update(self, change_event: ChangeEvent) -> None:
                    log_entry = f"[{change_event.timestamp:.2f}] {change_event.subject_id}: " \
                                f"{change_event.change_type.value} " \
                                f"({change_event.old_value} -> {change_event.new_value})"
                    self.log_entries.append(log_entry)
                    logger.info(f"{self.name}: {log_entry}")

            class AlertObserver(Observer):
                """警报观察者"""
                def __init__(self, alert_threshold: float = 80.0):
                    super().__init__("AlertObserver")
                    self.alert_threshold = alert_threshold
                    self.alerts_triggered: List[ChangeEvent] = []

                def update(self, change_event: ChangeEvent) -> None:
                    # 检查是否触发警报
                    if (isinstance(change_event.new_value, (int, float)) and
                        change_event.new_value > self.alert_threshold):
                        self.alerts_triggered.append(change_event)
                        logger.warning(f"{self.name}: 警报触发! "
                                       f"{change_event.subject_id} = {change_event.new_value} "
                                       f"(阈值: {self.alert_threshold})")

            class StatisticsObserver(Observer):
                """统计观察者"""
                def __init__(self, name: str = "StatisticsObserver"):
                    super().__init__(name)
                    self.statistics = {
                        'created': 0,
                        'updated': 0,
                        'deleted': 0,
                        'total_changes': 0
                    }

                def update(self, change_event: ChangeEvent) -> None:
                    self.statistics[change_event.change_type.value] += 1
                    self.statistics['total_changes'] += 1

                    logger.debug(f"{self.name}: 统计更新 - {self.statistics}")

                def get_statistics(self) -> Dict[str, int]:
                    return self.statistics.copy()

            class CachingObserver(Observer):
                """缓存观察者"""
                def __init__(self, cache_size: int = 100):
                    super().__init__("CachingObserver")
                    self.cache_size = cache_size
                    self.cache: Dict[str, ChangeEvent] = {}
                    self.cache_access_order: List[str] = []

                def update(self, change_event: ChangeEvent) -> None:
                    cache_key = f"{change_event.subject_id}_{change_event.timestamp}"

                    # 如果缓存已满,删除最旧的条目
                    if len(self.cache) >= self.cache_size:
                        if self.cache_access_order:
                            oldest_key = self.cache_access_order.pop(0)
                            if oldest_key in self.cache:
                                del self.cache[oldest_key]

                    # 添加新条目
                    self.cache[cache_key] = change_event
                    self.cache_access_order.append(cache_key)

                    logger.debug(f"{self.name}: 缓存更新,当前缓存大小: {len(self.cache)}")

                def get_latest_changes(self, subject_id: str, count: int = 5) -> List[ChangeEvent]:
                    """获取最新的变化"""
                    subject_changes = [
                        change for key, change in self.cache.items()
                        if change.subject_id == subject_id
                    ]

                    # 按时间戳排序,返回最新的count个
                    subject_changes.sort(key=lambda x: x.timestamp, reverse=True)
                    return subject_changes[:count]

            class ObserverPatternDemo:
                """观察者模式演示"""
                def __init__(self):
                    self.subjects = {}
                    self.observers = {}

                def setup_system(self):
                    """设置观察者系统"""
                    # 创建观察者
                    logging_observer = LoggingObserver("SystemLogger")
                    alert_observer = AlertObserver(alert_threshold=90.0)
                    statistics_observer = StatisticsObserver("SystemStats")
                    caching_observer = CachingObserver(cache_size=50)

                    self.observers = {
                        'logger': logging_observer,
                        'alerter': alert_observer,
                        'stats': statistics_observer,
                        'cache': caching_observer
                    }

                    # 创建主题
                    cpu_monitor = Subject("CPU_Monitor")
                    memory_monitor = Subject("Memory_Monitor")
                    disk_monitor = Subject("Disk_Monitor")
                    network_monitor = Subject("Network_Monitor")

                    self.subjects = {
                        'cpu': cpu_monitor,
                        'memory': memory_monitor,
                        'disk': disk_monitor,
                        'network': network_monitor
                    }

                    # 为所有主题添加观察者
                    for subject in self.subjects.values():
                        for observer in self.observers.values():
                            subject.attach(observer)

                    logger.info("观察者系统设置完成")

                def simulate_monitoring(self):
                    """模拟系统监控"""
                    logger.info("=== 开始系统监控模拟 ===")

                    # 模拟CPU监控
                    logger.info("模拟CPU监控...")
                    for i in range(10):
                        cpu_usage = 50 + (i * 5) + (time.time() % 20)
                        self.subjects['cpu'].set_state(
                            'usage',
                            cpu_usage,
                            {'timestamp': time.time(), 'core': 'all'}
                        )
                        time.sleep(0.5)

                    # 模拟内存监控
                    logger.info("模拟内存监控...")
                    for i in range(8):
                        memory_usage = 60 + (i * 4) + (time.time() % 15)
                        self.subjects['memory'].set_state(
                            'usage',
                            memory_usage,
                            {'timestamp': time.time(), 'type': 'RAM'}
                        )
                        time.sleep(0.6)

                    # 模拟磁盘监控
                    logger.info("模拟磁盘监控...")
                    disk_data = [
                        ('/dev/sda1', 45.2),
                        ('/dev/sda2', 78.9),
                        ('/dev/sdb1', 23.1),
                        ('/dev/sdb2', 91.5)  # 这个会触发警报
                    ]

                    for device, usage in disk_data:
                        self.subjects['disk'].set_state(
                            device,
                            usage,
                            {'timestamp': time.time(), 'mount_point': f'/mnt/{device[-1]}'}
                        )
                        time.sleep(0.4)

                    # 模拟网络监控
                    logger.info("模拟网络监控...")
                    for i in range(6):
                        network_speed = 100 + (i * 20) + (time.time() % 50)
                        self.subjects['network'].set_state(
                            'download_speed',
                            network_speed,
                            {'timestamp': time.time(), 'interface': 'eth0'}
                        )
                        time.sleep(0.7)

                    # 模拟一些状态删除
                    logger.info("模拟状态清理...")
                    self.subjects['disk'].delete_state('/dev/sda1')
                    time.sleep(0.3)

                    self.subjects['memory'].delete_state('swap_usage')
                    time.sleep(0.3)

                def demonstrate_observer_management(self):
                    """演示观察者管理"""
                    logger.info("=== 演示观察者管理 ===")

                    # 临时停用警报观察者
                    logger.info("停用警报观察者")
                    self.observers['alerter'].deactivate()

                    # 继续监控,但不会触发警报
                    high_cpu = 95.0
                    self.subjects['cpu'].set_state('usage', high_cpu)
                    time.sleep(1)

                    # 重新激活警报观察者
                    logger.info("重新激活警报观察者")
                    self.observers['alerter'].activate()

                    # 再次设置高CPU值,这次会触发警报
                    self.subjects['cpu'].set_state('usage', high_cpu + 5)
                    time.sleep(1)

                    # 动态添加新观察者
                    logger.info("添加新的观察者")
                    new_observer = LoggingObserver("TemporaryLogger")
                    self.subjects['memory'].attach(new_observer)

                    self.subjects['memory'].set_state('usage', 88.5)
                    time.sleep(1)

                    # 移除新观察者
                    logger.info("移除新观察者")
                    self.subjects['memory'].detach(new_observer)

                    self.subjects['memory'].set_state('usage', 75.2)
                    time.sleep(1)

                def demonstrate_change_waiting(self):
                    """演示变化等待"""
                    logger.info("=== 演示变化等待 ===")

                    # 在另一个线程中设置状态
                    def delayed_change():
                        time.sleep(2)
                        self.subjects['network'].set_state(
                            'latency',
                            150.5,
                            {'timestamp': time.time(), 'target': 'api.example.com'}
                        )

                    change_thread = threading.Thread(target=delayed_change)
                    change_thread.start()

                    # 主线程等待变化
                    logger.info("等待网络延迟变化...")
                    if self.subjects['network'].wait_for_change(timeout=3):
                        logger.info("检测到网络延迟变化")
                        latency = self.subjects['network'].get_state('latency')
                        logger.info(f"当前延迟: {latency} ms")
                    else:
                        logger.warning("等待超时")

                    change_thread.join()

                def run_demo(self):
                    """运行完整演示"""
                    logger.info("=== 观察者模式演示开始 ===")

                    # 设置观察者系统
                    self.setup_system()

                    try:
                        # 模拟系统监控
                        self.simulate_monitoring()

                        # 演示观察者管理
                        self.demonstrate_observer_management()

                        # 演示变化等待
                        self.demonstrate_change_waiting()

                        # 等待所有处理完成
                        time.sleep(2)

                    finally:
                        # 打印统计信息
                        self._print_statistics()

                    logger.info("=== 观察者模式演示完成 ===")

                def _print_statistics(self):
                    """打印统计信息"""
                    logger.info("\n=== 观察者统计 ===")

                    # 打印日志观察者的日志条目数
                    if 'logger' in self.observers:
                        logger.info(f"日志观察者记录了 {len(self.observers['logger'].log_entries)} 条日志")

                    # 打印警报观察者触发的警报数
                    if 'alerter' in self.observers:
                        logger.info(f"警报观察者触发了 {len(self.observers['alerter'].alerts_triggered)} 次警报")

                    # 打印统计观察者的统计信息
                    if 'stats' in self.observers:
                        stats = self.observers['stats'].get_statistics()
                        logger.info("变化统计:")
                        for key, value in stats.items():
                            logger.info(f"  {key}: {value}")

                    # 打印缓存观察者的缓存信息
                    if 'cache' in self.observers:
                        cache_info = self.observers['cache']
                        logger.info(f"缓存观察者缓存了 {len(cache_info.cache)} 个变化")

                    # 打印各主题的最终状态
                    logger.info("\n=== 主题最终状态 ===")
                    for name, subject in self.subjects.items():
                        state = subject.get_all_state()
                        if state:
                            logger.info(f"{name}: {state}")
                        else:
                            logger.info(f"{name}: 无状态")

            # 使用示例
            if __name__ == "__main__":
                demo = ObserverPatternDemo()
                demo.run_demo()
            ---

        c.命令模式
            a.模式描述
            将请求封装为命令对象,支持撤销和重做。
            b.实现要点
            使用Event对象触发命令执行。
            c.代码示例
            ---
            # Event命令模式 - 任务执行系统
            import threading
            import time
            import logging
            from typing import List, Dict, Any, Optional
            from dataclasses import dataclass
            from enum import Enum
            from abc import ABC, abstractmethod
            import queue

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class CommandStatus(Enum):
                """命令状态"""
                PENDING = "pending"
                RUNNING = "running"
                COMPLETED = "completed"
                FAILED = "failed"
                CANCELLED = "cancelled"

            @dataclass
            class CommandResult:
                """命令执行结果"""
                command_id: str
                status: CommandStatus
                result: Any
                error_message: Optional[str]
                start_time: float
                end_time: float
                duration: float

            class Command(ABC):
                """命令抽象基类"""
                def __init__(self, command_id: str, description: str):
                    self.command_id = command_id
                    self.description = description
                    self.result = None

                @abstractmethod
                def execute(self) -> CommandResult:
                    """执行命令"""
                    pass

                @abstractmethod
                def undo(self) -> bool:
                    """撤销命令"""
                    pass

                def __str__(self):
                    return f"Command({self.command_id}: {self.description})"

            class CommandInvoker:
                """命令调用器"""
                def __init__(self):
                    self.command_queue = queue.Queue()
                    self.execution_history: List[CommandResult] = []
                    self.undo_stack: List[Command] = []
                    self.is_running = False
                    self.worker_thread = None
                    self.execution_event = threading.Event()

                def start(self):
                    """启动命令执行器"""
                    if self.is_running:
                        logger.warning("命令执行器已经在运行")
                        return

                    self.is_running = True
                    self.worker_thread = threading.Thread(
                        target=self._execution_loop,
                        name="CommandInvoker"
                    )
                    self.worker_thread.start()
                    logger.info("命令执行器已启动")

                def stop(self):
                    """停止命令执行器"""
                    if not self.is_running:
                        return

                    self.is_running = False
                    self.execution_event.set()

                    if self.worker_thread:
                        self.worker_thread.join(timeout=3)
                        if self.worker_thread.is_alive():
                            logger.warning("命令执行器未能及时停止")

                    logger.info("命令执行器已停止")

                def submit_command(self, command: Command) -> str:
                    """提交命令"""
                    self.command_queue.put(command)
                    self.execution_event.set()
                    logger.info(f"提交命令: {command}")
                    return command.command_id

                def get_execution_history(self) -> List[CommandResult]:
                    """获取执行历史"""
                    return self.execution_history.copy()

                def get_last_result(self, command_id: str) -> Optional[CommandResult]:
                    """获取最后一个执行结果"""
                    for result in reversed(self.execution_history):
                        if result.command_id == command_id:
                            return result
                    return None

                def undo_last_command(self) -> bool:
                    """撤销最后一个命令"""
                    if not self.undo_stack:
                        logger.warning("没有可撤销的命令")
                        return False

                    last_command = self.undo_stack.pop()
                    try:
                        success = last_command.undo()
                        if success:
                            logger.info(f"成功撤销命令: {last_command}")
                        else:
                            logger.warning(f"撤销命令失败: {last_command}")
                        return success
                    except Exception as e:
                        logger.error(f"撤销命令异常: {e}")
                        return False

                def _execution_loop(self):
                    """命令执行循环"""
                    logger.info("命令执行循环启动")

                    while self.is_running:
                        # 等待命令
                        if self.execution_event.wait(timeout=1):
                            if not self.is_running:
                                break

                            # 获取所有待执行的命令
                            commands_to_execute = []
                            while not self.command_queue.empty():
                                try:
                                    command = self.command_queue.get_nowait()
                                    commands_to_execute.append(command)
                                except queue.Empty:
                                    break

                            if commands_to_execute:
                                self.execution_event.clear()

                            # 执行命令
                            for command in commands_to_execute:
                                if not self.is_running:
                                    break

                                self._execute_command(command)

                    logger.info("命令执行循环停止")

                def _execute_command(self, command: Command):
                    """执行单个命令"""
                    start_time = time.time()
                    logger.info(f"开始执行命令: {command}")

                    try:
                        # 执行命令
                        result = command.execute()
                        self.execution_history.append(result)

                        # 将命令添加到撤销栈
                        if result.status == CommandStatus.COMPLETED:
                            self.undo_stack.append(command)

                        logger.info(f"命令执行完成: {command} - {result.status.value}")
                        if result.error_message:
                            logger.error(f"命令错误: {result.error_message}")

                    except Exception as e:
                        # 命令执行异常
                        error_result = CommandResult(
                            command_id=command.command_id,
                            status=CommandStatus.FAILED,
                            result=None,
                            error_message=str(e),
                            start_time=start_time,
                            end_time=time.time(),
                            duration=time.time() - start_time
                        )
                        self.execution_history.append(error_result)
                        logger.error(f"命令执行异常: {command} - {e}")

            # 具体命令实现
            class PrintCommand(Command):
                """打印命令"""
                def __init__(self, command_id: str, message: str):
                    super().__init__(command_id, f"打印消息: {message}")
                    self.message = message
                    self.printed = False

                def execute(self) -> CommandResult:
                    start_time = time.time()
                    try:
                        print(f"执行打印: {self.message}")
                        self.printed = True
                        end_time = time.time()

                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.COMPLETED,
                            result=f"已打印: {self.message}",
                            error_message=None,
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )
                    except Exception as e:
                        end_time = time.time()
                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.FAILED,
                            result=None,
                            error_message=str(e),
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )

                def undo(self) -> bool:
                    if not self.printed:
                        return False

                    try:
                        print(f"撤销打印: {self.message}")
                        self.printed = False
                        return True
                    except Exception as e:
                        logger.error(f"撤销打印命令失败: {e}")
                        return False

            class DelayCommand(Command):
                """延迟命令"""
                def __init__(self, command_id: str, delay: float, message: str = ""):
                    super().__init__(command_id, f"延迟 {delay} 秒: {message}")
                    self.delay = delay
                    self.message = message
                    self.completed = False

                def execute(self) -> CommandResult:
                    start_time = time.time()
                    try:
                        logger.info(f"开始延迟 {self.delay} 秒...")
                        time.sleep(self.delay)
                        self.completed = True
                        end_time = time.time()

                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.COMPLETED,
                            result=f"延迟完成,实际耗时: {end_time - start_time:.2f}s",
                            error_message=None,
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )
                    except Exception as e:
                        end_time = time.time()
                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.FAILED,
                            result=None,
                            error_message=str(e),
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )

                def undo(self) -> bool:
                    # 延迟命令无法撤销
                    logger.warning(f"延迟命令无法撤销: {self}")
                    return False

            class CounterCommand(Command):
                """计数器命令"""
                def __init__(self, command_id: str, counter_ref: Dict[str, int], delta: int = 1):
                    super().__init__(command_id, f"计数器增减 {delta}")
                    self.counter_ref = counter_ref
                    self.delta = delta
                    self.initial_value = counter_ref.get('count', 0)

                def execute(self) -> CommandResult:
                    start_time = time.time()
                    try:
                        current_value = self.counter_ref.get('count', 0)
                        new_value = current_value + self.delta
                        self.counter_ref['count'] = new_value

                        end_time = time.time()
                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.COMPLETED,
                            result=f"计数器从 {current_value} 变为 {new_value}",
                            error_message=None,
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )
                    except Exception as e:
                        end_time = time.time()
                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.FAILED,
                            result=None,
                            error_message=str(e),
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )

                def undo(self) -> bool:
                    try:
                        current_value = self.counter_ref.get('count', 0)
                        restored_value = current_value - self.delta
                        self.counter_ref['count'] = restored_value
                        logger.info(f"撤销计数器命令: 从 {current_value} 恢复为 {restored_value}")
                        return True
                    except Exception as e:
                        logger.error(f"撤销计数器命令失败: {e}")
                        return False

            class FileCommand(Command):
                """文件操作命令"""
                def __init__(self, command_id: str, filepath: str, content: str):
                    super().__init__(command_id, f"文件写入: {filepath}")
                    self.filepath = filepath
                    self.content = content
                    self.original_content = None
                    self.file_created = False

                def execute(self) -> CommandResult:
                    start_time = time.time()
                    try:
                        # 读取原始内容(如果文件存在)
                        try:
                            with open(self.filepath, 'r', encoding='utf-8') as f:
                                self.original_content = f.read()
                        except FileNotFoundError:
                            self.original_content = None

                        # 写入新内容
                        with open(self.filepath, 'w', encoding='utf-8') as f:
                            f.write(self.content)

                        if self.original_content is None:
                            self.file_created = True

                        end_time = time.time()
                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.COMPLETED,
                            result=f"文件写入完成: {len(self.content)} 字符",
                            error_message=None,
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )
                    except Exception as e:
                        end_time = time.time()
                        return CommandResult(
                            command_id=self.command_id,
                            status=CommandStatus.FAILED,
                            result=None,
                            error_message=str(e),
                            start_time=start_time,
                            end_time=end_time,
                            duration=end_time - start_time
                        )

                def undo(self) -> bool:
                    try:
                        if self.file_created:
                            # 如果是新建的文件,删除它
                            import os
                            os.remove(self.filepath)
                            logger.info(f"撤销文件创建: 删除 {self.filepath}")
                        else:
                            # 如果文件之前就存在,恢复原始内容
                            with open(self.filepath, 'w', encoding='utf-8') as f:
                                if self.original_content is not None:
                                    f.write(self.original_content)
                                else:
                                    f.write("")
                            logger.info(f"撤销文件写入: 恢复 {self.filepath}")
                        return True
                    except Exception as e:
                        logger.error(f"撤销文件命令失败: {e}")
                        return False

            class CommandPatternDemo:
                """命令模式演示"""
                def __init__(self):
                    self.invoker = CommandInvoker()
                    self.counter = {'count': 0}
                    self.temp_files = []

                def cleanup(self):
                    """清理临时文件"""
                    import os
                    for filepath in self.temp_files:
                        try:
                            if os.path.exists(filepath):
                                os.remove(filepath)
                        except Exception as e:
                            logger.error(f"清理文件失败 {filepath}: {e}")

                def run_demo(self):
                    """运行完整演示"""
                    logger.info("=== 命令模式演示开始 ===")

                    try:
                        # 启动命令执行器
                        self.invoker.start()

                        # 演示基本命令
                        self.demo_basic_commands()

                        # 演示撤销功能
                        self.demo_undo_functionality()

                        # 演示异步执行
                        self.demo_async_execution()

                        # 演示文件操作
                        self.demo_file_operations()

                        # 等待所有命令完成
                        time.sleep(2)

                    finally:
                        # 停止命令执行器
                        self.invoker.stop()

                        # 清理临时文件
                        self.cleanup()

                    # 打印执行历史
                    self._print_execution_history()

                    logger.info("=== 命令模式演示完成 ===")

                def demo_basic_commands(self):
                    """演示基本命令"""
                    logger.info("=== 演示基本命令 ===")

                    # 提交打印命令
                    print_cmd1 = PrintCommand("print-1", "Hello, World!")
                    print_cmd2 = PrintCommand("print-2", "命令模式演示")
                    print_cmd3 = PrintCommand("print-3", "使用Event触发")

                    self.invoker.submit_command(print_cmd1)
                    self.invoker.submit_command(print_cmd2)
                    self.invoker.submit_command(print_cmd3)

                    time.sleep(1)

                def demo_undo_functionality(self):
                    """演示撤销功能"""
                    logger.info("=== 演示撤销功能 ===")

                    # 提交计数器命令
                    counter_cmd1 = CounterCommand("counter-1", self.counter, 1)
                    counter_cmd2 = CounterCommand("counter-2", self.counter, 5)
                    counter_cmd3 = CounterCommand("counter-3", self.counter, -2)

                    self.invoker.submit_command(counter_cmd1)
                    self.invoker.submit_command(counter_cmd2)
                    self.invoker.submit_command(counter_cmd3)

                    time.sleep(1)

                    logger.info(f"当前计数器值: {self.counter['count']}")

                    # 撤销最后一个命令
                    logger.info("撤销最后一个命令...")
                    if self.invoker.undo_last_command():
                        logger.info(f"撤销后计数器值: {self.counter['count']}")

                    # 再撤销一个命令
                    logger.info("再撤销一个命令...")
                    if self.invoker.undo_last_command():
                        logger.info(f"再次撤销后计数器值: {self.counter['count']}")

                def demo_async_execution(self):
                    """演示异步执行"""
                    logger.info("=== 演示异步执行 ===")

                    # 提交延迟命令
                    delay_cmd1 = DelayCommand("delay-1", 1.0, "1秒延迟")
                    delay_cmd2 = DelayCommand("delay-2", 2.0, "2秒延迟")
                    delay_cmd3 = DelayCommand("delay-3", 0.5, "0.5秒延迟")

                    self.invoker.submit_command(delay_cmd1)
                    self.invoker.submit_command(delay_cmd2)
                    self.invoker.submit_command(delay_cmd3)

                    # 在延迟期间提交其他命令
                    print_cmd = PrintCommand("print-during-delay", "在延迟期间执行")
                    self.invoker.submit_command(print_cmd)

                    time.sleep(4)

                def demo_file_operations(self):
                    """演示文件操作"""
                    logger.info("=== 演示文件操作 ===")

                    import tempfile
                    import os

                    # 创建临时文件
                    temp_file1 = tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt')
                    temp_file1.close()
                    temp_file1_path = temp_file1.name
                    self.temp_files.append(temp_file1_path)

                    temp_file2 = tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.log')
                    temp_file2.close()
                    temp_file2_path = temp_file2.name
                    self.temp_files.append(temp_file2_path)

                    # 提交文件命令
                    file_cmd1 = FileCommand("file-1", temp_file1_path, "这是第一份文件内容\n包含多行文本")
                    file_cmd2 = FileCommand("file-2", temp_file2_path, "日志内容\n2024-01-01 10:00:00 INFO 系统启动")

                    self.invoker.submit_command(file_cmd1)
                    self.invoker.submit_command(file_cmd2)

                    time.sleep(1)

                    # 读取文件内容验证
                    try:
                        with open(temp_file1_path, 'r', encoding='utf-8') as f:
                            content1 = f.read()
                        logger.info(f"文件1内容长度: {len(content1)} 字符")

                        with open(temp_file2_path, 'r', encoding='utf-8') as f:
                            content2 = f.read()
                        logger.info(f"文件2内容长度: {len(content2)} 字符")
                    except Exception as e:
                        logger.error(f"读取文件失败: {e}")

                    # 撤销文件操作
                    logger.info("撤销文件操作...")
                    if self.invoker.undo_last_command():
                        logger.info("文件2操作已撤销")

                    if self.invoker.undo_last_command():
                        logger.info("文件1操作已撤销")

                def _print_execution_history(self):
                    """打印执行历史"""
                    logger.info("\n=== 命令执行历史 ===")

                    history = self.invoker.get_execution_history()
                    for i, result in enumerate(history, 1):
                        logger.info(f"{i}. {result.command_id}: {result.status.value} "
                                   f"(耗时: {result.duration:.3f}s)")
                        if result.error_message:
                            logger.error(f"   错误: {result.error_message}")

                    logger.info(f"\n总计执行命令数: {len(history)}")

                    # 统计各种状态的命令数量
                    status_counts = {}
                    for result in history:
                        status_counts[result.status.value] = status_counts.get(result.status.value, 0) + 1

                    logger.info("状态统计:")
                    for status, count in status_counts.items():
                        logger.info(f"  {status}: {count} 个")

            # 使用示例
            if __name__ == "__main__":
                demo = CommandPatternDemo()
                demo.run_demo()
            ---

04.Event驱动的性能优化
    a.批量事件处理
        a.批量收集事件
            在短时间内收集多个事件,然后批量处理。
        b.减少唤醒次数
            降低线程上下文切换的开销。
        c.提高吞吐量
            通过批量处理提高系统吞吐量。
    b.事件优先级管理
        a.优先级队列
            根据事件重要性确定处理顺序。
        b.实时性保证
            确保高优先级事件及时处理。
        c.资源分配
            合理分配处理资源给不同优先级的事件。

05.Event驱动的最佳实践
    a.事件设计原则
        a.幂等性
            事件处理应该是幂等的。
        b.原子性
            事件发布应该是原子的。
        c.可观测性
            事件应该包含足够的上下文信息。
    b.错误处理策略
        a.重试机制
            对失败的事件处理实现重试。
        b.死信队列
            无法处理的事件进入死信队列。
        c.监控告警
            对事件处理失败进行监控和告警。
    c.扩展性考虑
        a.水平扩展
            支持多实例部署和负载均衡。
        b.垂直扩展
            支持增加单个实例的处理能力。
        c.弹性伸缩
            根据负载自动调整实例数量。

6. 屏障

6.1 threading.Barrier

01.Barrier基础概念
    a.定义与作用
        threading.Barrier是Python标准库中提供的同步原语,用于协调多个线程在某个执行点同步等待。
        它允许一组线程相互等待,直到所有线程都到达屏障点,然后同时继续执行。
    b.核心特性
        a.参与者固定
            屏障在创建时指定参与线程数量,这个数量在生命周期内保持不变。
        b.同步点机制
            当所有指定数量的线程都调用wait()方法时,所有线程同时被唤醒。
        c.一次性重置
            默认情况下,屏障是一次性的,可以通过参数设置为可重用。
        d.超时处理
            支持超时参数,防止线程无限等待。
    c.应用场景
        a.多阶段计算
            在并行算法中协调不同计算阶段的执行。
        b.数据并行处理
            确保所有工作线程完成当前阶段后再进入下一阶段。
        c.初始化同步
            在系统启动时等待所有组件初始化完成。

02.Barrier的创建与初始化
    a.构造方法详解
        a.parties参数
            必须参数,指定需要等待的线程数量,必须是正整数。
        b.action参数
            可选参数,指定当所有线程都到达屏障时执行的回调函数。
        c.timeout参数
            可选参数,指定wait()方法的默认超时时间。
    b.参数选择指导
        a.parties数量确定
            根据实际的线程数量和业务逻辑需求确定参与者的数量。
        b.action函数设计
            回调函数应该尽量简短,避免阻塞其他线程的执行。
        c.超时时间设置
            根据系统性能和业务容忍度设置合理的超时时间。
    d.代码示例
        ---
        # Barrier基础创建与初始化示例
        import threading
        import time
        import logging
        from typing import Optional

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        def setup_callback():
            """屏障回调函数 - 当所有线程到达时执行"""
            logger.info("=== 所有线程已到达屏障点,开始下一阶段 ===")
            # 这里可以执行一些初始化或清理工作
            return "屏障回调执行完成"

        class BarrierDemo:
            def __init__(self, num_threads: int = 3):
                """
                初始化屏障演示类

                Args:
                    num_threads: 需要同步的线程数量
                """
                self.num_threads = num_threads
                # 创建屏障,设置回调函数和超时时间
                self.barrier = threading.Barrier(
                    parties=num_threads,
                    action=setup_callback,
                    timeout=10.0
                )
                self.threads = []
                self.results = []

            def worker_thread(self, thread_id: int, work_duration: float = 2.0):
                """
                工作线程函数

                Args:
                    thread_id: 线程标识
                    work_duration: 工作持续时间
                """
                try:
                    logger.info(f"线程 {thread_id}: 开始执行第一阶段工作")

                    # 模拟第一阶段的工作
                    time.sleep(work_duration)
                    result1 = f"线程{thread_id}-阶段1结果"
                    logger.info(f"线程 {thread_id}: 第一阶段完成,准备等待其他线程")

                    # 等待所有线程到达屏障点
                    logger.info(f"线程 {thread_id}: 到达屏障点,等待其他线程...")
                    wait_result = self.barrier.wait()
                    logger.info(f"线程 {thread_id}: 通过屏障,等待结果: {wait_result}")

                    # 所有线程都通过屏障后,继续执行第二阶段
                    logger.info(f"线程 {thread_id}: 开始执行第二阶段工作")
                    time.sleep(work_duration * 0.5)
                    result2 = f"线程{thread_id}-阶段2结果"

                    # 保存结果
                    self.results.append({
                        'thread_id': thread_id,
                        'phase1': result1,
                        'phase2': result2,
                        'wait_result': wait_result
                    })

                    logger.info(f"线程 {thread_id}: 所有工作完成")

                except threading.BrokenBarrierError:
                    logger.error(f"线程 {thread_id}: 屏障已损坏,退出执行")
                except Exception as e:
                    logger.error(f"线程 {thread_id}: 执行异常 - {e}")

            def run_demo(self):
                """运行屏障演示"""
                logger.info(f"=== Barrier演示开始,参与者数量: {self.num_threads} ===")

                # 创建并启动工作线程
                for i in range(self.num_threads):
                    thread = threading.Thread(
                        target=self.worker_thread,
                        args=(i + 1, 1.5 + i * 0.3)  # 不同的工作时长
                    )
                    self.threads.append(thread)
                    thread.start()
                    time.sleep(0.2)  # 稍微错开启动时间

                # 等待所有线程完成
                for thread in self.threads:
                    thread.join()

                # 打印结果统计
                self.print_results()

            def print_results(self):
                """打印执行结果"""
                logger.info("\n=== 执行结果统计 ===")
                for result in self.results:
                    logger.info(f"线程{result['thread_id']}: "
                              f"等待结果={result['wait_result']}, "
                              f"阶段1={result['phase1']}, "
                              f"阶段2={result['phase2']}")

        # 使用示例
        if __name__ == "__main__":
            demo = BarrierDemo(num_threads=4)
            demo.run_demo()
        ---

03.Barrier核心方法详解
    a.wait()方法
        a.功能说明
            阻塞当前线程,直到所有指定数量的线程都调用wait()方法。
            当所有线程都到达时,同时释放所有线程继续执行。
        b.返回值
            返回一个从0到parties-1的整数,每个线程获得不同的值,用于标识线程在屏障中的顺序。
        c.超时机制
            如果设置了超时时间,在超时时间内所有线程未到达,抛出Timeout异常。
        d.代码示例
            ---
            # Barrier.wait()方法详解示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class BarrierWaitDemo:
                def __init__(self, num_threads: int = 5):
                    self.num_threads = num_threads
                    # 创建屏障,不设置回调函数,专注于wait()方法演示
                    self.barrier = threading.Barrier(parties=num_threads)
                    self.arrival_times: Dict[int, float] = {}
                    self.wait_results: List[int] = []

                def worker_with_timing(self, thread_id: int):
                    """带时间统计的工作线程"""
                    arrival_time = time.time()

                    logger.info(f"线程 {thread_id}: 启动,执行预处理任务")

                    # 模拟不同的预处理时间
                    preprocess_time = random.uniform(0.5, 3.0)
                    time.sleep(preprocess_time)

                    logger.info(f"线程 {thread_id}: 预处理完成 ({preprocess_time:.2f}s),到达屏障")
                    self.arrival_times[thread_id] = time.time()

                    try:
                        # 等待其他线程,获取等待结果
                        start_wait = time.time()
                        wait_result = self.barrier.wait(timeout=5.0)
                        end_wait = time.time()
                        actual_wait_time = end_wait - start_wait

                        self.wait_results.append(wait_result)

                        logger.info(f"线程 {thread_id}: 通过屏障! "
                                  f"等待结果={wait_result}, "
                                  f"实际等待时间={actual_wait_time:.2f}s")

                        # 根据等待结果执行不同的后续任务
                        self._execute_post_barrier_task(thread_id, wait_result)

                    except threading.BrokenBarrierError:
                        logger.error(f"线程 {thread_id}: 屏障损坏异常")
                    except Exception as e:
                        logger.error(f"线程 {thread_id}: 等待异常 - {e}")

                def _execute_post_barrier_task(self, thread_id: int, wait_result: int):
                    """根据等待结果执行后续任务"""
                    if wait_result == 0:
                        logger.info(f"线程 {thread_id}: 第一个通过屏障,执行主任务")
                        time.sleep(1.0)
                    elif wait_result % 2 == 1:
                        logger.info(f"线程 {thread_id}: 奇数位置,执行辅助任务")
                        time.sleep(0.5)
                    else:
                        logger.info(f"线程 {thread_id}: 偶数位置,执行备份任务")
                        time.sleep(0.3)

                def worker_with_timeout(self, thread_id: int, should_timeout: bool = False):
                    """演示超时机制的工作线程"""
                    logger.info(f"线程 {thread_id}: 启动 (超时演示)")

                    if should_timeout:
                        # 故意延迟,演示超时
                        time.sleep(8.0)
                        logger.info(f"线程 {thread_id}: 延迟到达屏障")
                    else:
                        time.sleep(1.0)
                        logger.info(f"线程 {thread_id}: 正常到达屏障")

                    try:
                        # 设置较短的超时时间
                        wait_result = self.barrier.wait(timeout=3.0)
                        logger.info(f"线程 {thread_id}: 成功通过屏障,结果={wait_result}")

                    except Exception as e:
                        logger.error(f"线程 {thread_id}: 超时或其他异常 - {e}")

                def run_wait_demo(self):
                    """运行wait()方法演示"""
                    logger.info(f"=== Barrier.wait()方法演示 ===")
                    logger.info(f"参与者数量: {self.num_threads}")

                    threads = []
                    # 创建正常的工作线程
                    for i in range(self.num_threads):
                        thread = threading.Thread(
                            target=self.worker_with_timing,
                            args=(i + 1,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)

                    # 等待所有线程完成
                    for thread in threads:
                        thread.join()

                    # 分析结果
                    self._analyze_wait_results()

                def run_timeout_demo(self):
                    """运行超时演示"""
                    logger.info(f"\n=== Barrier超时机制演示 ===")

                    # 创建新的屏障用于超时演示
                    timeout_barrier = threading.Barrier(parties=self.num_threads)

                    threads = []
                    for i in range(self.num_threads):
                        # 最后一个线程故意超时
                        should_timeout = (i == self.num_threads - 1)
                        thread = threading.Thread(
                            target=self._worker_timeout_with_barrier,
                            args=(i + 1, timeout_barrier, should_timeout)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)

                    # 等待线程完成(可能有异常)
                    for thread in threads:
                        thread.join(timeout=10)
                        if thread.is_alive():
                            logger.warning(f"线程 {thread.name} 仍在运行")

                def _worker_timeout_with_barrier(self, thread_id: int, barrier: threading.Barrier, should_timeout: bool):
                    """超时演示的工作线程"""
                    logger.info(f"超时演示线程 {thread_id}: 启动")

                    if should_timeout:
                        time.sleep(6.0)  # 故意延迟超过超时时间
                        logger.info(f"超时演示线程 {thread_id}: 延迟到达")
                    else:
                        time.sleep(1.0)
                        logger.info(f"超时演示线程 {thread_id}: 准时到达")

                    try:
                        result = barrier.wait(timeout=3.0)
                        logger.info(f"超时演示线程 {thread_id}: 通过屏障,结果={result}")
                    except Exception as e:
                        logger.error(f"超时演示线程 {thread_id}: 异常 - {e}")

                def _analyze_wait_results(self):
                    """分析wait()方法的执行结果"""
                    logger.info("\n=== 等待结果分析 ===")
                    logger.info(f"到达顺序: {sorted(self.arrival_times.keys(), key=lambda x: self.arrival_times[x])}")
                    logger.info(f"等待结果: {sorted(self.wait_results)}")

                    # 计算时间统计
                    if len(self.arrival_times) >= 2:
                        arrival_times = list(self.arrival_times.values())
                        time_span = max(arrival_times) - min(arrival_times)
                        logger.info(f"到达时间跨度: {time_span:.2f}秒")

            # 使用示例
            if __name__ == "__main__":
                demo = BarrierWaitDemo(num_threads=5)

                # 演示正常的wait()方法
                demo.run_wait_demo()

                # 演示超时机制
                demo.run_timeout_demo()
            ---

04.Barrier高级特性
    a.回调函数机制
        a.action参数详解
            当所有线程都到达屏障时,由其中一个线程执行指定的回调函数。
        b.执行线程选择
            回调函数由等待结果为0的线程执行。
        c.异常处理
            回调函数中的异常会影响所有线程的继续执行。
        d.代码示例
            ---
            # Barrier回调函数机制示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class BarrierActionDemo:
                def __init__(self, num_teams: int = 3):
                    self.num_teams = num_teams
                    # 创建带回调函数的屏障
                    self.barrier = threading.Barrier(
                        parties=num_teams,
                        action=self.phase_completion_callback
                    )
                    self.team_results: Dict[str, Any] = {}
                    self.phase_completion_times: List[float] = []

                def phase_completion_callback(self):
                    """阶段完成回调函数"""
                    completion_time = time.time()
                    self.phase_completion_times.append(completion_time)

                    logger.info("=== 阶段完成回调执行 ===")
                    logger.info(f"完成时间: {time.strftime('%H:%M:%S', time.localtime(completion_time))}")

                    # 执行一些资源重置或状态更新工作
                    self._reset_resources()
                    self._update_statistics()

                    # 模拟回调函数的处理时间
                    time.sleep(0.5)

                    logger.info("=== 回调执行完成,所有线程继续执行 ===")
                    return "回调执行成功"

                def _reset_resources(self):
                    """重置共享资源"""
                    logger.info("回调: 重置共享资源")
                    # 这里可以重置共享变量、清理临时文件等

                def _update_statistics(self):
                    """更新统计信息"""
                    logger.info("回调: 更新统计信息")
                    # 这里可以收集性能指标、更新数据库等

                def team_worker(self, team_name: str, tasks: List[str]):
                    """团队工作线程"""
                    logger.info(f"团队 {team_name}: 开始执行任务")

                    for phase, task in enumerate(tasks, 1):
                        logger.info(f"团队 {team_name}: 执行阶段{phase} - {task}")

                        # 模拟任务执行时间
                        execution_time = random.uniform(1.0, 3.0)
                        time.sleep(execution_time)

                        # 记录阶段结果
                        self.team_results[f"{team_name}_phase{phase}"] = {
                            'task': task,
                            'execution_time': execution_time,
                            'completion_time': time.time()
                        }

                        logger.info(f"团队 {team_name}: 阶段{phase}完成,等待其他团队")

                        try:
                            # 等待所有团队完成当前阶段
                            wait_result = self.barrier.wait()
                            logger.info(f"团队 {team_name}: 进入下一阶段 (等待结果: {wait_result})")

                        except threading.BrokenBarrierError:
                            logger.error(f"团队 {team_name}: 屏障损坏,停止执行")
                            break
                        except Exception as e:
                            logger.error(f"团队 {team_name}: 等待异常 - {e}")
                            break

                    logger.info(f"团队 {team_name}: 所有任务完成")

                def run_multi_phase_demo(self):
                    """运行多阶段协作演示"""
                    logger.info(f"=== 多阶段协作演示开始 ===")
                    logger.info(f"参与团队数: {self.num_teams}")

                    # 定义各团队的任务列表
                    teams_tasks = {
                        "数据采集": ["收集用户数据", "验证数据完整性", "清洗异常数据"],
                        "数据处理": ["转换数据格式", "应用业务规则", "生成处理结果"],
                        "数据分析": ["统计分析", "可视化展示", "生成报告"]
                    }

                    # 创建并启动团队工作线程
                    threads = []
                    for team_name, tasks in teams_tasks.items():
                        if len(threads) < self.num_teams:  # 确保不超过屏障参与数
                            thread = threading.Thread(
                                target=self.team_worker,
                                args=(team_name, tasks)
                            )
                            threads.append(thread)
                            thread.start()

                    # 等待所有团队完成
                    for thread in threads:
                        thread.join()

                    # 打印执行统计
                    self._print_execution_statistics()

                def _print_execution_statistics(self):
                    """打印执行统计信息"""
                    logger.info("\n=== 执行统计 ===")

                    # 按团队分组显示结果
                    for team_result_key in sorted(self.team_results.keys()):
                        if 'phase1' in team_result_key:
                            team_name = team_result_key.split('_phase1')[0]
                            logger.info(f"\n团队: {team_name}")

                            for phase in [1, 2, 3]:
                                key = f"{team_name}_phase{phase}"
                                if key in self.team_results:
                                    result = self.team_results[key]
                                    logger.info(f"  阶段{phase}: {result['task']} "
                                              f"(耗时: {result['execution_time']:.2f}s)")

                    # 显示阶段完成时间
                    if self.phase_completion_times:
                        logger.info(f"\n阶段完成时间点:")
                        for i, completion_time in enumerate(self.phase_completion_times, 1):
                            logger.info(f"  阶段{i}: {time.strftime('%H:%M:%S', time.localtime(completion_time))}")

            # 使用示例
            if __name__ == "__main__":
                demo = BarrierActionDemo(num_teams=3)
                demo.run_multi_phase_demo()
            ---

    b.异常处理机制
        a.BrokenBarrierError异常
            当屏障被损坏或重置时抛出此异常。
        b.超时异常
            当wait()方法超时时抛出TimeoutError异常。
        c.异常恢复策略
            屏障一旦损坏,需要重新创建才能继续使用。
        d.代码示例
            ---
            # Barrier异常处理机制示例
            import threading
            import time
            import logging

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class BarrierExceptionDemo:
                def __init__(self, num_threads: int = 4):
                    self.num_threads = num_threads
                    self.barrier = threading.Barrier(parties=num_threads)
                    self.abnormal_thread_id = None

                def normal_worker(self, thread_id: int):
                    """正常工作线程"""
                    logger.info(f"线程 {thread_id}: 正常启动")
                    time.sleep(1.0)

                    try:
                        logger.info(f"线程 {thread_id}: 到达屏障,等待")
                        result = self.barrier.wait()
                        logger.info(f"线程 {thread_id}: 通过屏障,结果={result}")

                    except threading.BrokenBarrierError:
                        logger.error(f"线程 {thread_id}: 检测到屏障损坏")
                    except Exception as e:
                        logger.error(f"线程 {thread_id}: 其他异常 - {e}")

                def abnormal_worker(self, thread_id: int):
                    """异常工作线程"""
                    logger.info(f"线程 {thread_id}: 异常启动")

                    # 模拟工作中出现异常
                    time.sleep(0.5)

                    # 强制中断线程,导致屏障损坏
                    logger.warning(f"线程 {thread_id}: 模拟异常中断")
                    self.abnormal_thread_id = thread_id
                    raise Exception("模拟的工作异常")

                def timeout_worker(self, thread_id: int, timeout_time: float):
                    """超时工作线程"""
                    logger.info(f"线程 {thread_id}: 启动(将超时)")

                    try:
                        # 故意延迟,导致超时
                        time.sleep(timeout_time + 1.0)
                        logger.info(f"线程 {thread_id}: 延迟到达屏障")
                        result = self.barrier.wait(timeout=timeout_time)
                        logger.info(f"线程 {thread_id}: 成功通过屏障,结果={result}")

                    except Exception as e:
                        logger.error(f"线程 {thread_id}: 等待超时或其他异常 - {e}")

                def recovery_worker(self, thread_id: int, new_barrier: threading.Barrier):
                    """恢复工作线程 - 使用新的屏障"""
                    logger.info(f"恢复线程 {thread_id}: 使用新屏障")
                    time.sleep(0.8)

                    try:
                        result = new_barrier.wait(timeout=5.0)
                        logger.info(f"恢复线程 {thread_id}: 成功通过新屏障,结果={result}")
                    except Exception as e:
                        logger.error(f"恢复线程 {thread_id}: 新屏障异常 - {e}")

                def demonstrate_broken_barrier(self):
                    """演示屏障损坏异常"""
                    logger.info("\n=== 屏障损坏异常演示 ===")

                    threads = []

                    # 创建异常线程
                    abnormal_thread = threading.Thread(
                        target=self.abnormal_worker,
                        args=(99,)  # 使用特殊ID标识异常线程
                    )
                    threads.append(abnormal_thread)

                    # 创建正常线程
                    for i in range(self.num_threads - 1):
                        thread = threading.Thread(
                            target=self.normal_worker,
                            args=(i + 1,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)

                    # 启动异常线程
                    abnormal_thread.start()

                    # 等待所有线程完成
                    for thread in threads:
                        try:
                            thread.join(timeout=3)
                            if thread.is_alive():
                                logger.warning(f"线程 {thread.name} 未能在超时内完成")
                        except Exception as e:
                            logger.error(f"等待线程异常 - {e}")

                    # 检查屏障状态
                    self._check_barrier_status()

                def demonstrate_timeout_exception(self):
                    """演示超时异常"""
                    logger.info("\n=== 超时异常演示 ===")

                    threads = []
                    timeout_time = 2.0

                    # 创建会超时的线程
                    timeout_thread = threading.Thread(
                        target=self.timeout_worker,
                        args=(100, timeout_time)  # 使用特殊ID标识超时线程
                    )
                    threads.append(timeout_thread)

                    # 创建正常线程
                    for i in range(self.num_threads - 1):
                        thread = threading.Thread(
                            target=self.normal_worker,
                            args=(i + 1,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)

                    # 启动超时线程
                    timeout_thread.start()

                    # 等待线程完成
                    for thread in threads:
                        thread.join(timeout=5)

                def demonstrate_recovery(self):
                    """演示异常恢复机制"""
                    logger.info("\n=== 异常恢复演示 ===")

                    # 创建新的屏障用于恢复
                    recovery_barrier = threading.Barrier(parties=3)

                    threads = []
                    for i in range(3):
                        thread = threading.Thread(
                            target=self.recovery_worker,
                            args=(i + 1, recovery_barrier)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.2)

                    # 等待恢复线程完成
                    for thread in threads:
                        thread.join()

                    logger.info("恢复演示完成")

                def _check_barrier_status(self):
                    """检查屏障状态"""
                    try:
                        # 尝试使用已损坏的屏障
                        self.barrier.wait(timeout=1.0)
                        logger.info("屏障状态正常")
                    except threading.BrokenBarrierError:
                        logger.error("屏障已损坏")
                    except Exception as e:
                        logger.error(f"屏障状态检查异常 - {e}")

                def run_exception_demo(self):
                    """运行所有异常演示"""
                    logger.info("=== Barrier异常处理机制演示 ===")

                    # 演示屏障损坏
                    self.demonstrate_broken_barrier()
                    time.sleep(1)

                    # 演示超时异常
                    self.demonstrate_timeout_exception()
                    time.sleep(1)

                    # 演示恢复机制
                    self.demonstrate_recovery()

            # 使用示例
            if __name__ == "__main__":
                demo = BarrierExceptionDemo(num_threads=4)
                demo.run_exception_demo()
            ---

05.Barrier性能特征与优化
    a.性能开销分析
        a.内存占用
            Barrier对象本身占用内存很小,主要开销在等待队列管理。
        b.CPU使用率
            无竞争时开销很小,高竞争时会产生线程上下文切换开销。
        c.时间复杂度
            wait()方法的时间复杂度为O(1),主要开销在线程调度。
    b.性能优化策略
        a.减少等待时间
            通过合理的工作分配,减少线程到达屏障的时间差。
        b.异常处理优化
            快速检测和处理异常,避免其他线程长时间等待。
        c.回调函数优化
            保持回调函数简短,避免阻塞其他线程的执行。
    c.代码示例
            ---
            # Barrier性能优化示例
            import threading
            import time
            import logging
            import random
            from concurrent.futures import ThreadPoolExecutor
            from typing import List, Dict, Any
            import statistics

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class BarrierPerformanceDemo:
                def __init__(self, num_threads: int = 8):
                    self.num_threads = num_threads
                    self.performance_metrics: List[Dict[str, Any]] = []

                def optimized_worker(self, thread_id: int, work_load: float = 1.0):
                    """优化的工作线程"""
                    start_time = time.time()

                    # 阶段1: 预处理(负载均衡)
                    preprocess_start = time.time()
                    adjusted_work = work_load * (0.8 + 0.4 * random.random())  # 添加随机性
                    time.sleep(adjusted_work * 0.3)
                    preprocess_time = time.time() - preprocess_start

                    # 阶段2: 屏障同步(重点优化区域)
                    sync_start = time.time()

                    # 创建专门的屏障用于此轮测试
                    phase_barrier = threading.Barrier(parties=self.num_threads)

                    try:
                        # 使用较短的超时时间,快速检测异常
                        wait_result = phase_barrier.wait(timeout=max(2.0, adjusted_work))
                        sync_time = time.time() - sync_start

                    except Exception as e:
                        logger.warning(f"线程 {thread_id}: 同步异常 - {e}")
                        sync_time = float('inf')
                        return

                    # 阶段3: 后处理
                    postprocess_start = time.time()
                    time.sleep(adjusted_work * 0.2)
                    postprocess_time = time.time() - postprocess_start

                    total_time = time.time() - start_time

                    # 记录性能指标
                    metrics = {
                        'thread_id': thread_id,
                        'work_load': adjusted_work,
                        'preprocess_time': preprocess_time,
                        'sync_time': sync_time,
                        'postprocess_time': postprocess_time,
                        'total_time': total_time,
                        'wait_result': wait_result
                    }
                    self.performance_metrics.append(metrics)

                    logger.info(f"线程 {thread_id}: 完成 "
                              f"(预处理:{preprocess_time:.3f}s, "
                              f"同步:{sync_time:.3f}s, "
                              f"后处理:{postprocess_time:.3f}s, "
                              f"总计:{total_time:.3f}s)")

                def unoptimized_worker(self, thread_id: int, work_load: float = 1.0):
                    """未优化的工作线程 - 用于对比"""
                    start_time = time.time()

                    # 阶段1: 预处理(无负载均衡)
                    time.sleep(work_load * 0.3)

                    # 阶段2: 屏障同步(长超时)
                    sync_start = time.time()
                    phase_barrier = threading.Barrier(parties=self.num_threads)

                    try:
                        # 使用很长的超时时间
                        wait_result = phase_barrier.wait(timeout=30.0)
                        sync_time = time.time() - sync_start

                    except Exception as e:
                        logger.warning(f"未优化线程 {thread_id}: 同步异常 - {e}")
                        sync_time = float('inf')
                        return

                    # 阶段3: 后处理(固定时间)
                    time.sleep(work_load * 0.2)

                    total_time = time.time() - start_time

                    logger.info(f"未优化线程 {thread_id}: 完成,总时间: {total_time:.3f}s")

                def run_performance_comparison(self, rounds: int = 3):
                    """运行性能对比测试"""
                    logger.info(f"=== Barrier性能对比测试 ===")
                    logger.info(f"线程数: {self.num_threads}, 测试轮数: {rounds}")

                    for round_num in range(1, rounds + 1):
                        logger.info(f"\n--- 第 {round_num} 轮测试 ---")

                        # 测试优化版本
                        logger.info("优化版本测试:")
                        self.performance_metrics.clear()

                        with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
                            # 提交优化的工作任务
                            futures = [
                                executor.submit(self.optimized_worker, i + 1, 1.0 + i * 0.1)
                                for i in range(self.num_threads)
                            ]

                            # 等待所有任务完成
                            for future in futures:
                                try:
                                    future.result(timeout=10)
                                except Exception as e:
                                    logger.error(f"任务执行异常: {e}")

                        # 分析优化版本的性能
                        if self.performance_metrics:
                            self._analyze_performance_metrics(f"优化版本第{round_num}轮")

                        time.sleep(1)  # 轮次间隔

                def _analyze_performance_metrics(self, test_name: str):
                    """分析性能指标"""
                        if not self.performance_metrics:
                            return

                        # 提取各项时间指标
                        preprocess_times = [m['preprocess_time'] for m in self.performance_metrics if m['preprocess_time'] != float('inf')]
                        sync_times = [m['sync_time'] for m in self.performance_metrics if m['sync_time'] != float('inf')]
                        postprocess_times = [m['postprocess_time'] for m in self.performance_metrics]
                        total_times = [m['total_time'] for m in self.performance_metrics]

                        # 计算统计数据
                        def calc_stats(times):
                            if not times:
                                return {'mean': 0, 'median': 0, 'min': 0, 'max': 0}
                            return {
                                'mean': statistics.mean(times),
                                'median': statistics.median(times),
                                'min': min(times),
                                'max': max(times),
                                'std': statistics.stdev(times) if len(times) > 1 else 0
                            }

                        preprocess_stats = calc_stats(preprocess_times)
                        sync_stats = calc_stats(sync_times)
                        postprocess_stats = calc_stats(postprocess_times)
                        total_stats = calc_stats(total_times)

                        logger.info(f"\n{test_name} 性能统计:")
                        logger.info(f"  预处理时间: 平均={preprocess_stats['mean']:.3f}s, "
                                  f"标准差={preprocess_stats['std']:.3f}s")
                        logger.info(f"  同步时间: 平均={sync_stats['mean']:.3f}s, "
                                  f"标准差={sync_stats['std']:.3f}s")
                        logger.info(f"  后处理时间: 平均={postprocess_stats['mean']:.3f}s")
                        logger.info(f"  总执行时间: 平均={total_stats['mean']:.3f}s, "
                                  f"最快={total_stats['min']:.3f}s, "
                                  f"最慢={total_stats['max']:.3f}s")

                        # 同步效率分析
                        if len(sync_times) > 1:
                            sync_variance = statistics.variance(sync_times)
                            logger.info(f"  同步时间方差: {sync_variance:.6f} (越小越好)")

                def run_stress_test(self, duration: int = 10):
                    """运行压力测试"""
                    logger.info(f"\n=== Barrier压力测试 ===")
                    logger.info(f"测试持续时间: {duration}秒")

                    start_time = time.time()
                    round_count = 0

                    with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
                        while time.time() - start_time < duration:
                            round_count += 1
                            logger.info(f"压力测试第 {round_count} 轮")

                            # 清理之前的数据
                            self.performance_metrics.clear()

                            # 提交工作任务
                            futures = [
                                executor.submit(self.optimized_worker, i + 1, random.uniform(0.5, 2.0))
                                for i in range(self.num_threads)
                            ]

                            # 等待完成
                            for future in futures:
                                try:
                                    future.result(timeout=5)
                                except Exception as e:
                                    logger.error(f"压力测试异常: {e}")

                            # 短暂休息
                            time.sleep(0.5)

                    logger.info(f"压力测试完成,共执行 {round_count} 轮")

                def run_optimization_suggestions(self):
                    """提供优化建议"""
                    logger.info("\n=== Barrier优化建议 ===")

                    suggestions = [
                        "1. 负载均衡: 根据线程性能差异,动态分配工作量",
                        "2. 超时设置: 设置合理的超时时间,快速检测异常",
                        "3. 异常处理: 及时处理异常,避免长时间等待",
                        "4. 回调优化: 保持回调函数简短,避免阻塞其他线程",
                        "5. 资源管理: 合理管理线程池和系统资源",
                        "6. 监控指标: 监控同步时间、完成率等关键指标",
                        "7. 重试机制: 在异常情况下实现重试逻辑"
                    ]

                    for suggestion in suggestions:
                        logger.info(f"  {suggestion}")

            # 使用示例
            if __name__ == "__main__":
                demo = BarrierPerformanceDemo(num_threads=6)

                # 运行性能对比测试
                demo.run_performance_comparison(rounds=3)

                # 运行压力测试
                demo.run_stress_test(duration=8)

                # 提供优化建议
                demo.run_optimization_suggestions()
            ---

06.Barrier实际应用场景
    a.多阶段数据处理
        a.数据ETL流程
            在Extract、Transform、Load各阶段之间使用屏障同步。
        b.批处理作业
            确保所有分片数据处理完成后再进行下一阶段。
        c.代码示例
        b.分布式计算
        a.MapReduce模式
            在Map和Reduce阶段之间使用屏障同步。
        b.并行算法
            在并行排序、搜索等算法中使用屏障进行阶段同步。
        c.系统初始化
        a.服务启动协调
            等待所有服务组件初始化完成后再启动主业务。
        b.配置加载同步
            确保所有配置模块加载完成后再开始服务。
        d.完整应用示例
        ---
        # Barrier实际应用场景示例 - 并行数据处理系统
        import threading
        import time
        import logging
        import random
        from typing import List, Dict, Any, Tuple
        from dataclasses import dataclass
        from enum import Enum
        import queue

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class ProcessingPhase(Enum):
            """处理阶段枚举"""
            EXTRACT = "数据提取"
            TRANSFORM = "数据转换"
            LOAD = "数据加载"
            VALIDATE = "数据验证"

        @dataclass
        class DataRecord:
            """数据记录"""
            id: str
            source: str
            raw_data: Dict[str, Any]
            transformed_data: Dict[str, Any] = None
            validation_result: bool = False
            load_status: str = "pending"

        class ParallelDataProcessor:
            """并行数据处理器 - 展示Barrier的实际应用"""
            def __init__(self, num_workers: int = 4):
                self.num_workers = num_workers

                # 为不同阶段创建屏障
                self.extract_barrier = threading.Barrier(
                    parties=num_workers,
                    action=self._on_extraction_complete
                )
                self.transform_barrier = threading.Barrier(
                    parties=num_workers,
                    action=self._on_transform_complete
                )
                self.load_barrier = threading.Barrier(
                    parties=num_workers,
                    action=self._on_load_complete
                )
                self.validate_barrier = threading.Barrier(
                    parties=num_workers,
                    action=self._on_validation_complete
                )

                # 数据队列和结果存储
                self.raw_data_queue = queue.Queue()
                self.processed_data: List[DataRecord] = []
                self.batch_results: Dict[str, Any] = {}

                # 工作线程
                self.workers = []
                self.is_running = False

            def _on_extraction_complete(self):
                """数据提取完成回调"""
                logger.info("=== 所有工作线程完成数据提取 ===")
                self.batch_results['extract_complete_time'] = time.time()
                self._update_phase_stats(ProcessingPhase.EXTRACT)

            def _on_transform_complete(self):
                """数据转换完成回调"""
                logger.info("=== 所有工作线程完成数据转换 ===")
                self.batch_results['transform_complete_time'] = time.time()
                self._update_phase_stats(ProcessingPhase.TRANSFORM)

            def _on_load_complete(self):
                """数据加载完成回调"""
                logger.info("=== 所有工作线程完成数据加载 ===")
                self.batch_results['load_complete_time'] = time.time()
                self._update_phase_stats(ProcessingPhase.LOAD)

            def _on_validation_complete(self):
                """数据验证完成回调"""
                logger.info("=== 所有工作线程完成数据验证 ===")
                self.batch_results['validate_complete_time'] = time.time()
                self._update_phase_stats(ProcessingPhase.VALIDATE)

            def _update_phase_stats(self, phase: ProcessingPhase):
                """更新阶段统计信息"""
                phase_key = f"{phase.value}_stats"
                if phase_key not in self.batch_results:
                    self.batch_results[phase_key] = {
                        'start_time': time.time(),
                        'records_processed': 0
                    }

            def generate_sample_data(self, record_count: int = 20):
                """生成示例数据"""
                logger.info(f"生成 {record_count} 条示例数据")
                sources = ['database', 'api', 'file', 'stream']

                for i in range(record_count):
                    record = DataRecord(
                        id=f"record_{i:04d}",
                        source=random.choice(sources),
                        raw_data={
                            'timestamp': time.time() - random.uniform(0, 3600),
                            'value': random.uniform(100, 1000),
                            'category': random.choice(['A', 'B', 'C']),
                            'metadata': {
                                'region': random.choice(['north', 'south', 'east', 'west']),
                                'priority': random.randint(1, 5)
                            }
                        }
                    )
                    self.raw_data_queue.put(record)

            def data_worker(self, worker_id: int):
                """数据处理工作线程"""
                logger.info(f"工作线程 {worker_id}: 启动")

                processed_count = 0
                worker_start_time = time.time()

                try:
                    while self.is_running:
                        # 阶段1: 数据提取
                        records = self._extract_data(worker_id)
                        if not records:
                            break

                        # 等待所有线程完成提取
                        logger.info(f"工作线程 {worker_id}: 数据提取完成,等待其他线程")
                        self.extract_barrier.wait()

                        # 阶段2: 数据转换
                        transformed_records = self._transform_data(worker_id, records)

                        # 等待所有线程完成转换
                        logger.info(f"工作线程 {worker_id}: 数据转换完成,等待其他线程")
                        self.transform_barrier.wait()

                        # 阶段3: 数据加载
                        loaded_records = self._load_data(worker_id, transformed_records)

                        # 等待所有线程完成加载
                        logger.info(f"工作线程 {worker_id}: 数据加载完成,等待其他线程")
                        self.load_barrier.wait()

                        # 阶段4: 数据验证
                        validated_records = self._validate_data(worker_id, loaded_records)

                        # 等待所有线程完成验证
                        logger.info(f"工作线程 {worker_id}: 数据验证完成,等待其他线程")
                        self.validate_barrier.wait()

                        processed_count += len(validated_records)

                except threading.BrokenBarrierError:
                    logger.error(f"工作线程 {worker_id}: 屏障损坏,停止工作")
                except Exception as e:
                    logger.error(f"工作线程 {worker_id}: 处理异常 - {e}")

                worker_end_time = time.time()
                logger.info(f"工作线程 {worker_id}: 结束,处理 {processed_count} 条记录,"
                          f"总耗时: {worker_end_time - worker_start_time:.2f}秒")

            def _extract_data(self, worker_id: int, max_records: int = 5) -> List[DataRecord]:
                """数据提取阶段"""
                logger.info(f"工作线程 {worker_id}: 开始数据提取")
                records = []

                for _ in range(max_records):
                    try:
                        # 从队列获取数据,设置超时
                        record = self.raw_data_queue.get(timeout=0.1)
                        records.append(record)
                        self.batch_results.setdefault('extracted_records', 0)
                        self.batch_results['extracted_records'] += 1

                    except queue.Empty:
                        break

                # 模拟提取时间
                time.sleep(random.uniform(0.1, 0.5))

                logger.info(f"工作线程 {worker_id}: 提取 {len(records)} 条记录")
                return records

            def _transform_data(self, worker_id: int, records: List[DataRecord]) -> List[DataRecord]:
                """数据转换阶段"""
                logger.info(f"工作线程 {worker_id}: 开始数据转换")

                for record in records:
                    # 数据转换逻辑
                    transformed = {
                        'normalized_value': record.raw_data['value'] / 100,
                        'category_code': {'A': 1, 'B': 2, 'C': 3}[record.raw_data['category']],
                        'priority_score': record.raw_data['metadata']['priority'] * 2,
                        'processed_timestamp': time.time(),
                        'worker_id': worker_id
                    }

                    record.transformed_data = transformed

                    # 模拟转换时间
                    time.sleep(random.uniform(0.05, 0.2))

                self.batch_results.setdefault('transformed_records', 0)
                self.batch_results['transformed_records'] += len(records)

                logger.info(f"工作线程 {worker_id}: 转换 {len(records)} 条记录")
                return records

            def _load_data(self, worker_id: int, records: List[DataRecord]) -> List[DataRecord]:
                """数据加载阶段"""
                logger.info(f"工作线程 {worker_id}: 开始数据加载")

                for record in records:
                    if record.transformed_data:
                        # 模拟数据库写入或API调用
                        time.sleep(random.uniform(0.02, 0.1))

                        # 设置加载状态
                        record.load_status = f"loaded_by_worker_{worker_id}"
                        record.load_timestamp = time.time()

                        # 添加到处理结果
                        self.processed_data.append(record)

                    self.batch_results.setdefault('loaded_records', 0)
                    self.batch_results['loaded_records'] += 1

                logger.info(f"工作线程 {worker_id}: 加载 {len(records)} 条记录")
                return records

            def _validate_data(self, worker_id: int, records: List[DataRecord]) -> List[DataRecord]:
                """数据验证阶段"""
                logger.info(f"工作线程 {worker_id}: 开始数据验证")

                validated_records = []
                for record in records:
                    if record.transformed_data and record.load_status:
                        # 验证逻辑
                        is_valid = (
                            0 <= record.transformed_data['normalized_value'] <= 10 and
                            1 <= record.transformed_data['category_code'] <= 3 and
                            record.transformed_data['priority_score'] > 0
                        )

                        record.validation_result = is_valid
                        record.validated_by = worker_id
                        record.validation_timestamp = time.time()

                        if is_valid:
                            validated_records.append(record)

                        # 模拟验证时间
                        time.sleep(random.uniform(0.01, 0.05))

                self.batch_results.setdefault('validated_records', 0)
                self.batch_results['validated_records'] += len(validated_records)

                logger.info(f"工作线程 {worker_id}: 验证 {len(validated_records)}/{len(records)} 条记录")
                return validated_records

            def run_parallel_processing(self, data_count: int = 50):
                """运行并行处理"""
                logger.info(f"=== 开始并行数据处理 ===")
                logger.info(f"工作线程数: {self.num_workers}, 数据量: {data_count}")

                processing_start_time = time.time()

                # 生成测试数据
                self.generate_sample_data(data_count)

                # 启动工作线程
                self.is_running = True
                self.workers = []

                for i in range(self.num_workers):
                    worker = threading.Thread(
                        target=self.data_worker,
                        args=(i + 1,)
                    )
                    self.workers.append(worker)
                    worker.start()
                    time.sleep(0.1)  # 错开启动时间

                # 等待所有工作线程完成
                for worker in self.workers:
                    worker.join(timeout=30)
                    if worker.is_alive():
                        logger.warning(f"工作线程 {worker.name} 未能在超时内完成")

                processing_end_time = time.time()
                total_processing_time = processing_end_time - processing_start_time

                # 停止处理
                self.is_running = False

                # 打印处理结果
                self._print_processing_results(total_processing_time)

            def _print_processing_results(self, total_time: float):
                """打印处理结果"""
                logger.info("\n=== 数据处理结果 ===")

                # 统计结果
                total_extracted = self.batch_results.get('extracted_records', 0)
                total_transformed = self.batch_results.get('transformed_records', 0)
                total_loaded = self.batch_results.get('loaded_records', 0)
                total_validated = self.batch_results.get('validated_records', 0)

                logger.info(f"总处理时间: {total_time:.2f}秒")
                logger.info(f"数据提取: {total_extracted} 条")
                logger.info(f"数据转换: {total_transformed} 条")
                logger.info(f"数据加载: {total_loaded} 条")
                logger.info(f"数据验证: {total_validated} 条")
                logger.info(f"处理效率: {total_validated/total_time:.2f} 条/秒")

                # 验证结果统计
                valid_records = [r for r in self.processed_data if r.validation_result]
                invalid_records = [r for r in self.processed_data if not r.validation_result]

                logger.info(f"验证通过: {len(valid_records)} 条")
                logger.info(f"验证失败: {len(invalid_records)} 条")
                logger.info(f"验证通过率: {len(valid_records)/(len(valid_records)+len(invalid_records))*100:.1f}%")

                # 性能统计
                if 'extract_complete_time' in self.batch_results:
                    extract_time = self.batch_results['extract_complete_time']
                    logger.info(f"提取阶段完成时间: {time.strftime('%H:%M:%S', time.localtime(extract_time))}")

                # 按来源统计
                source_stats = {}
                for record in valid_records:
                    source = record.source
                    source_stats[source] = source_stats.get(source, 0) + 1

                logger.info("\n按数据来源统计:")
                for source, count in source_stats.items():
                    logger.info(f"  {source}: {count} 条")

            def run_performance_comparison(self):
                """运行性能对比 - Barrier vs 无屏障"""
                logger.info("\n=== 性能对比测试 ===")

                # Barrier版本已在上面运行,这里简单说明无屏障版本的问题
                logger.info("无屏障版本的问题:")
                logger.info("  1. 无法确保所有工作线程完成当前阶段再进入下一阶段")
                logger.info("  2. 可能导致数据不一致或处理遗漏")
                logger.info("  3. 难以协调复杂的处理流程")
                logger.info("  4. 资源利用率不均衡")

                logger.info("\nBarrier版本的优势:")
                logger.info("  1. 确保所有阶段同步完成")
                logger.info("  2. 提供清晰的执行流程控制")
                logger.info("  3. 支持阶段完成后的回调处理")
                logger.info("  4. 便于错误检测和恢复")

            # 使用示例
            if __name__ == "__main__":
                processor = ParallelDataProcessor(num_workers=4)

                # 运行并行数据处理
                processor.run_parallel_processing(data_count=40)

                # 运行性能对比
                processor.run_performance_comparison()
            ---

6.2 多线程同步点

01.同步点基础概念
    a.定义与作用
        多线程同步点是指多个并发线程在执行过程中需要达成一致的特定位置。
        在这些点上,所有线程必须等待其他线程到达,然后才能继续执行下一步操作。
    b.同步点的特征
        a.协调性
            确保多个线程的执行顺序和时间点得到协调。
        b.一致性
            保证所有线程在同步点上看到相同的状态或数据。
        c.可控性
            通过同步机制控制线程的执行流程。
    c.Barrier作为同步点的优势
        a.精确控制
            确保指定数量的线程全部到达才继续执行。
        b.原子性
            所有线程同时释放,避免竞态条件。
        c.可重用性
            支持多轮次的同步控制。
    d.应用场景分析
        a.数据处理阶段
            在多阶段数据处理中,每个阶段完成后的同步点。
        b.并行算法
            在并行排序、矩阵运算等算法中的分步同步。
        c.资源访问控制
            在需要协调多个线程对共享资源访问时的同步。

02.基础同步点实现
    a.简单两点同步
        a.同步模式
            两个或多个线程在一个同步点等待汇合。
        b.实现方式
            使用Barrier的wait()方法实现基础的同步。
        c.代码示例
            ---
            # 基础两点同步示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class BasicSyncPoint:
                """基础同步点演示"""
                def __init__(self, num_threads: int = 3):
                    self.num_threads = num_threads
                    # 创建基础屏障
                    self.sync_barrier = threading.Barrier(parties=num_threads)
                    self.results: List[Dict[str, Any]] = []

                def worker_with_basic_sync(self, worker_id: int):
                    """带基础同步的工作线程"""
                    start_time = time.time()
                    logger.info(f"工作线程 {worker_id}: 启动,执行第一阶段的任务")

                    # 第一阶段:独立任务
                    phase1_duration = random.uniform(1.0, 3.0)
                    time.sleep(phase1_duration)

                    phase1_result = f"工作线程{worker_id}-阶段1完成"
                    logger.info(f"工作线程 {worker_id}: 阶段1完成 ({phase1_duration:.2f}s)")

                    # 同步点1:等待所有线程完成阶段1
                    sync_start_time = time.time()
                    logger.info(f"工作线程 {worker_id}: 到达同步点,等待其他线程")
                    try:
                        sync_result = self.sync_barrier.wait()
                        sync_duration = time.time() - sync_start_time
                        logger.info(f"工作线程 {worker_id}: 通过同步点,等待结果: {sync_result} "
                                f"(等待时间: {sync_duration:.2f}s)")

                    except threading.BrokenBarrierError:
                        logger.error(f"工作线程 {worker_id}: 同步点损坏")
                        return

                    # 第二阶段:协同任务
                    logger.info(f"工作线程 {worker_id}: 开始第二阶段协同任务")
                    phase2_duration = random.uniform(0.5, 2.0)
                    time.sleep(phase2_duration)

                    phase2_result = f"工作线程{worker_id}-阶段2完成"
                    total_time = time.time() - start_time

                    # 记录执行结果
                    result = {
                        'worker_id': worker_id,
                        'phase1_duration': phase1_duration,
                        'phase2_duration': phase2_duration,
                        'total_time': total_time,
                        'sync_result': sync_result,
                        'phase1_result': phase1_result,
                        'phase2_result': phase2_result
                    }
                    self.results.append(result)

                    logger.info(f"工作线程 {worker_id}: 全部任务完成,总耗时: {total_time:.2f}s")

                def run_basic_sync_demo(self):
                    """运行基础同步演示"""
                    logger.info(f"=== 基础两点同步演示 ===")
                    logger.info(f"参与线程数: {self.num_threads}")

                    threads = []
                    start_time = time.time()

                    # 创建并启动工作线程
                    for i in range(self.num_threads):
                        thread = threading.Thread(
                            target=self.worker_with_basic_sync,
                            args=(i + 1,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)  # 稍微错开启动时间

                    # 等待所有线程完成
                    for thread in threads:
                        thread.join()

                    total_demo_time = time.time() - start_time
                    self._analyze_results(total_demo_time)

                def _analyze_results(self, total_demo_time: float):
                    """分析执行结果"""
                    logger.info("\n=== 基础同步结果分析 ===")
                    logger.info(f"演示总耗时: {total_demo_time:.2f}秒")
                    logger.info(f"处理的线程数: {len(self.results)}")

                    # 计算时间统计
                    phase1_times = [r['phase1_duration'] for r in self.results]
                    phase2_times = [r['phase2_duration'] for r in self.results]
                    total_times = [r['total_time'] for r in self.results]

                    avg_phase1 = sum(phase1_times) / len(phase1_times)
                    avg_phase2 = sum(phase2_times) / len(phase2_times)
                    avg_total = sum(total_times) / len(total_times)

                    max_phase1 = max(phase1_times)
                    min_phase1 = min(phase1_times)
                    time_span = max_phase1 - min_phase1

                    logger.info(f"阶段1平均耗时: {avg_phase1:.2f}秒 (最长: {max_phase1:.2f}s, 最短: {min_phase1:.2f}s)")
                    logger.info(f"阶段1时间跨度: {time_span:.2f}秒 (影响同步等待时间)")
                    logger.info(f"阶段2平均耗时: {avg_phase2:.2f}秒")
                    logger.info(f"总平均耗时: {avg_total:.2f}秒")

                    # 分析同步效果
                    sync_results = [r['sync_result'] for r in self.results]
                    logger.info(f"同步结果分布: {sorted(sync_results)}")

            # 使用示例
            if __name__ == "__main__":
                sync_point = BasicSyncPoint(num_threads=4)
                sync_point.run_basic_sync_demo()
            ---
    b.多点同步模式
        a.连续同步点
            在一个工作流程中设置多个连续的同步点。
        b.分层同步
            不同组别的线程在不同的层级进行同步。
        c.代码示例
            ---
            # 多点同步模式示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any
            from dataclasses import dataclass

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            @dataclass
            class SyncMetrics:
                """同步指标"""
                phase: int
                worker_id: int
                start_time: float
                sync_time: float
                duration: float

            class MultiPointSync:
                """多点同步演示"""
                def __init__(self, num_threads: int = 4):
                    self.num_threads = num_threads

                    # 为不同阶段创建多个同步点
                    self.init_barrier = threading.Barrier(parties=num_threads)
                    self.phase1_barrier = threading.Barrier(parties=num_threads)
                    self.phase2_barrier = threading.Barrier(parties=num_threads)
                    self.final_barrier = threading.Barrier(parties=num_threads)

                    self.sync_metrics: List[SyncMetrics] = []

                def multi_phase_worker(self, worker_id: int):
                    """多阶段工作线程"""
                    logger.info(f"工作线程 {worker_id}: 启动多阶段任务")

                    # 初始化阶段
                    init_start = time.time()
                    time.sleep(random.uniform(0.5, 1.5))
                    self._wait_at_sync_point(worker_id, "初始化", self.init_barrier, init_start)

                    # 阶段1: 数据收集
                    phase1_start = time.time()
                    logger.info(f"工作线程 {worker_id}: 开始数据收集")
                    time.sleep(random.uniform(2.0, 4.0))
                    self._wait_at_sync_point(worker_id, "阶段1", self.phase1_barrier, phase1_start)

                    # 阶段2: 数据处理
                    phase2_start = time.time()
                    logger.info(f"工作线程 {worker_id}: 开始数据处理")
                    time.sleep(random.uniform(1.5, 3.0))
                    self._wait_at_sync_point(worker_id, "阶段2", self.phase2_barrier, phase2_start)

                    # 最终阶段
                    final_start = time.time()
                    logger.info(f"工作线程 {worker_id}: 开始最终处理")
                    time.sleep(random.uniform(1.0, 2.0))
                    self._wait_at_sync_point(worker_id, "最终", self.final_barrier, final_start)

                    logger.info(f"工作线程 {worker_id}: 所有阶段完成")

                def _wait_at_sync_point(self, worker_id: int, phase_name: str,
                                    barrier: threading.Barrier, start_time: float):
                    """在指定同步点等待"""
                    logger.info(f"工作线程 {worker_id}: 到达{phase_name}同步点")
                    sync_start = time.time()

                    try:
                        wait_result = barrier.wait()
                        sync_time = time.time() - sync_start
                        total_time = time.time() - start_time

                        # 记录同步指标
                        metrics = SyncMetrics(
                            phase=len(self.sync_metrics),
                            worker_id=worker_id,
                            start_time=start_time,
                            sync_time=sync_time,
                            duration=total_time
                        )
                        self.sync_metrics.append(metrics)

                        logger.info(f"工作线程 {worker_id}: 通过{phase_name}同步点,"
                                f"等待时间: {sync_time:.2f}s, 总时间: {total_time:.2f}s")

                    except threading.BrokenBarrierError:
                        logger.error(f"工作线程 {worker_id}: {phase_name}同步点损坏")

                def run_multi_sync_demo(self):
                    """运行多点同步演示"""
                    logger.info(f"=== 多点同步演示 ===")
                    logger.info(f"参与线程数: {self.num_threads}")

                    threads = []
                    demo_start = time.time()

                    # 创建并启动工作线程
                    for i in range(self.num_threads):
                        thread = threading.Thread(
                            target=self.multi_phase_worker,
                            args=(i + 1,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)

                    # 等待所有线程完成
                    for thread in threads:
                        thread.join()

                    demo_time = time.time() - demo_start
                    self._analyze_multi_sync_results(demo_time)

                def _analyze_multi_sync_results(self, total_demo_time: float):
                    """分析多点同步结果"""
                    logger.info("\n=== 多点同步结果分析 ===")
                    logger.info(f"演示总耗时: {total_demo_time:.2f}秒")

                    # 按阶段分析
                    phase_names = ["初始化", "阶段1", "阶段2", "最终"]

                    for i, phase_name in enumerate(phase_names):
                        phase_metrics = [m for m in self.sync_metrics if m.phase == i]
                        if not phase_metrics:
                            continue

                        sync_times = [m.sync_time for m in phase_metrics]
                        durations = [m.duration for m in phase_metrics]

                        avg_sync = sum(sync_times) / len(sync_times)
                        avg_duration = sum(durations) / len(durations)
                        max_sync = max(sync_times)
                        min_sync = min(sync_times)

                        logger.info(f"{phase_name}阶段:")
                        logger.info(f"  平均同步等待时间: {avg_sync:.2f}s")
                        logger.info(f"  同步等待时间范围: {min_sync:.2f}s - {max_sync:.2f}s")
                        logger.info(f"  平均阶段总耗时: {avg_duration:.2f}s")

                    # 分析同步效果
                    logger.info("\n同步效率分析:")
                    total_sync_time = sum(m.sync_time for m in self.sync_metrics)
                    avg_sync_per_point = total_sync_time / len(self.sync_metrics)
                    logger.info(f"平均每点同步时间: {avg_sync_per_point:.2f}秒")
                    logger.info(f"总同步开销: {total_sync_time:.2f}秒")
                    logger.info(f"同步开销占比: {total_sync_time/total_demo_time*100:.1f}%")

            # 使用示例
            if __name__ == "__main__":
                multi_sync = MultiPointSync(num_threads=5)
                multi_sync.run_multi_sync_demo()
            ---

03.条件同步点
    a.条件等待机制
        a.基本原理
            同步点不仅等待线程到达,还等待特定条件满足。
        b.实现方式
            结合Barrier和条件变量实现复杂同步逻辑。
        c.代码示例
            ---
            # 条件同步点示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any, Callable
            from dataclasses import dataclass
            from enum import Enum

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class SyncCondition(Enum):
                """同步条件类型"""
                ALL_READY = "全部就绪"
                MIN_PROGRESS = "最小进度"
                DATA_AVAILABLE = "数据可用"
                ERROR_CHECK = "错误检查"

            @dataclass
            class WorkerState:
                """工作线程状态"""
                worker_id: int
                ready: bool = False
                progress: int = 0
                has_data: bool = False
                has_error: bool = False
                data_size: int = 0

            class ConditionalSyncPoint:
                """条件同步点演示"""
                def __init__(self, num_threads: int = 4):
                    self.num_threads = num_threads
                    self.workers_state: Dict[int, WorkerState] = {}
                    self.condition_lock = threading.Lock()
                    self.condition = threading.Condition(self.condition_lock)

                    # 基础屏障
                    self.sync_barrier = threading.Barrier(parties=num_threads)

                def initialize_workers(self):
                    """初始化工作线程状态"""
                    for i in range(1, self.num_threads + 1):
                        self.workers_state[i] = WorkerState(worker_id=i)

                def conditional_worker(self, worker_id: int):
                    """条件同步工作线程"""
                    logger.info(f"工作线程 {worker_id}: 启动条件同步任务")
                    state = self.workers_state[worker_id]

                    # 准备阶段
                    prep_time = random.uniform(0.5, 2.0)
                    time.sleep(prep_time)
                    state.ready = True
                    state.progress = random.randint(20, 60)

                    logger.info(f"工作线程 {worker_id}: 准备完成,进度: {state.progress}%")

                    # 条件同步点1:等待所有线程就绪
                    self._wait_for_condition(worker_id, SyncCondition.ALL_READY,
                                        lambda: all(w.ready for w in self.workers_state.values()))

                    # 数据生成阶段
                    data_gen_time = random.uniform(1.0, 3.0)
                    time.sleep(data_gen_time)
                    state.has_data = True
                    state.data_size = random.randint(100, 1000)
                    state.progress += random.randint(10, 40)

                    logger.info(f"工作线程 {worker_id}: 数据生成完成,大小: {state.data_size},进度: {state.progress}%")

                    # 条件同步点2:等待所有线程都有数据
                    self._wait_for_condition(worker_id, SyncCondition.DATA_AVAILABLE,
                                        lambda: all(w.has_data for w in self.workers_state.values()))

                    # 进度检查阶段
                    progress_inc = random.randint(5, 20)
                    time.sleep(random.uniform(0.5, 1.5))
                    state.progress += progress_inc

                    # 检查是否有错误
                    if random.random() < 0.2:  # 20%概率出现错误
                        state.has_error = True
                        logger.warning(f"工作线程 {worker_id}: 检测到错误")

                    logger.info(f"工作线程 {worker_id}: 进度检查完成,总进度: {state.progress}%")

                    # 条件同步点3:等待最小进度达成
                    self._wait_for_condition(worker_id, SyncCondition.MIN_PROGRESS,
                                        lambda: all(w.progress >= 70 for w in self.workers_state.values()))

                    # 错误检查阶段
                    error_time = random.uniform(0.3, 1.0)
                    time.sleep(error_time)

                    # 条件同步点4:错误检查同步
                    self._wait_for_condition(worker_id, SyncCondition.ERROR_CHECK,
                                        lambda: any(w.has_error for w in self.workers_state.values()) or
                                                all(w.progress >= 85 for w in self.workers_state.values()))

                    # 最终处理
                    final_time = random.uniform(0.5, 1.0)
                    time.sleep(final_time)
                    state.progress = 100

                    logger.info(f"工作线程 {worker_id}: 所有任务完成,最终进度: {state.progress}%")

                def _wait_for_condition(self, worker_id: int, condition_type: SyncCondition,
                                    condition_func: Callable[[], bool]):
                    """等待特定条件满足"""
                    with self.condition:
                        logger.info(f"工作线程 {worker_id}: 等待条件 [{condition_type.value}]")
                        wait_start = time.time()

                        while not condition_func():
                            # 使用屏障进行基本同步,同时等待条件
                            try:
                                # 尝试等待其他线程
                                self.condition.wait(timeout=0.1)
                            except:
                                pass

                        wait_time = time.time() - wait_start
                        logger.info(f"工作线程 {worker_id}: 条件 [{condition_type.value}] 满足,"
                                f"等待时间: {wait_time:.2f}s")

                def monitor_worker(self, monitor_id: int):
                    """监控线程 - 监控工作线程状态"""
                    logger.info(f"监控线程 {monitor_id}: 开始监控")

                    while True:
                        with self.condition:
                            ready_count = sum(1 for w in self.workers_state.values() if w.ready)
                            data_count = sum(1 for w in self.workers_state.values() if w.has_data)
                            error_count = sum(1 for w in self.workers_state.values() if w.has_error)
                            avg_progress = sum(w.progress for w in self.workers_state.values()) / len(self.workers_state)

                            logger.debug(f"监控: 就绪={ready_count}, 有数据={data_count}, 错误={error_count}, "
                                    f"平均进度={avg_progress:.1f}%")

                            # 检查是否所有工作都完成
                            if all(w.progress == 100 for w in self.workers_state.values()):
                                break

                        time.sleep(0.5)

                    logger.info(f"监控线程 {monitor_id}: 监控结束")

                def run_conditional_sync_demo(self):
                    """运行条件同步演示"""
                    logger.info(f"=== 条件同步点演示 ===")
                    logger.info(f"参与线程数: {self.num_threads}")

                    # 初始化工作线程状态
                    self.initialize_workers()

                    threads = []
                    demo_start = time.time()

                    # 创建工作线程
                    for i in range(1, self.num_threads + 1):
                        thread = threading.Thread(
                            target=self.conditional_worker,
                            args=(i,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.1)

                    # 创建监控线程
                    monitor_thread = threading.Thread(target=self.monitor_worker, args=(1,))
                    monitor_thread.start()

                    # 等待所有线程完成
                    for thread in threads:
                        thread.join()

                    monitor_thread.join()

                    demo_time = time.time() - demo_start
                    self._analyze_conditional_results(demo_time)

                def _analyze_conditional_results(self, total_demo_time: float):
                    """分析条件同步结果"""
                    logger.info("\n=== 条件同步结果分析 ===")
                    logger.info(f"演示总耗时: {total_demo_time:.2f}秒")

                    # 统计各线程状态
                    ready_count = sum(1 for w in self.workers_state.values() if w.ready)
                    data_count = sum(1 for w in self.workers_state.values() if w.has_data)
                    error_count = sum(1 for w in self.workers_state.values() if w.has_error)
                    total_data_size = sum(w.data_size for w in self.workers_state.values())

                    logger.info(f"准备完成的线程数: {ready_count}/{self.num_threads}")
                    logger.info(f"生成数据的线程数: {data_count}/{self.num_threads}")
                    logger.info(f"出现错误的线程数: {error_count}/{self.num_threads}")
                    logger.info(f"总数据大小: {total_data_size} 字节")

                    # 详细状态
                    logger.info("\n各线程详细状态:")
                    for worker_id, state in self.workers_state.items():
                        logger.info(f"  线程{worker_id}: 就绪={state.ready}, 有数据={state.has_data}, "
                                f"错误={state.has_error}, 数据大小={state.data_size}, 最终进度={state.progress}%")

            # 使用示例
            if __name__ == "__main__":
                conditional_sync = ConditionalSyncPoint(num_threads=4)
                conditional_sync.run_conditional_sync_demo()
            ---

04.分层同步点架构
    a.分层同步概念
        a.定义
            将大规模并发系统按照层次结构组织,不同层次的线程在不同级别的同步点进行同步。
        b.优势
            减少同步开销,提高系统性能,增强可扩展性。
        c.适用场景
            大规模并行计算、分布式系统、多层次任务处理。
    b.实现策略
        a.线程分组
            将线程按功能或任务分组,每个组有自己的同步点。
        b.层级同步
            组内同步和组间同步形成层次结构。
        c.代码示例
            ---
            # 分层同步点架构示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any, Optional
            from dataclasses import dataclass
            from enum import Enum

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class ThreadGroup(Enum):
                """线程组类型"""
                DATA_COLLECTORS = "数据收集组"
                DATA_PROCESSORS = "数据处理组"
                DATA_VALIDATORS = "数据验证组"
                RESULT_AGGREGATORS = "结果汇总组"

            @dataclass
            class WorkerInfo:
                """工作线程信息"""
                worker_id: int
                group: ThreadGroup
                layer: int
                task_count: int
                completed_tasks: int = 0
                group_sync_result: Optional[int] = None
                layer_sync_result: Optional[int] = None

            class HierarchicalSyncPoint:
                """分层同步点架构演示"""
                def __init__(self, total_threads: int = 12):
                    self.total_threads = total_threads

                    # 按组分配线程数
                    self.group_sizes = {
                        ThreadGroup.DATA_COLLECTORS: total_threads // 4,
                        ThreadGroup.DATA_PROCESSORS: total_threads // 2,
                        ThreadGroup.DATA_VALIDATORS: total_threads // 6,
                        ThreadGroup.RESULT_AGGREGATORS: total_threads // 12
                    }

                    # 层级同步点
                    self.layer1_barrier = threading.Barrier(parties=sum(self.group_sizes.values()))

                    # 组内同步点
                    self.group_barriers = {}
                    for group, size in self.group_sizes.items():
                        if size > 0:
                            self.group_barriers[group] = threading.Barrier(parties=size)

                    # 线程信息
                    self.workers: List[WorkerInfo] = []
                    self._initialize_workers()

                    # 结果存储
                    self.layer_results: Dict[int, List[Any]] = {}
                    self.group_results: Dict[ThreadGroup, List[Any]] = {}

                def _initialize_workers(self):
                    """初始化工作线程信息"""
                    worker_id = 1
                    for layer in range(3):  # 3个层级
                        for group in ThreadGroup:
                            if worker_id <= self.total_threads:
                                worker = WorkerInfo(
                                    worker_id=worker_id,
                                    group=group,
                                    layer=layer,
                                    task_count=random.randint(3, 8)
                                )
                                self.workers.append(worker)
                                worker_id += 1

                def hierarchical_worker(self, worker_info: WorkerInfo):
                    """分层工作线程"""
                    logger.info(f"工作线程{worker_info.worker_id} ({worker_info.group.value}): "
                            f"启动层级{worker_info.layer}任务")

                    # 层级1: 组内任务执行
                    self._execute_group_tasks(worker_info)

                    # 层级1: 组内同步点
                    if worker_info.group in self.group_barriers:
                        try:
                            sync_result = self.group_barriers[worker_info.group].wait()
                            worker_info.group_sync_result = sync_result
                            logger.info(f"工作线程{worker_info.worker_id}: 通过组内同步点 "
                                    f"({worker_info.group.value}),结果: {sync_result}")
                        except threading.BrokenBarrierError:
                            logger.error(f"工作线程{worker_info.worker_id}: 组内同步点损坏")
                            return

                    # 层级2: 跨组协作任务
                    self._execute_cross_group_tasks(worker_info)

                    # 层级2: 全局同步点
                    try:
                        global_sync_result = self.layer1_barrier.wait()
                        worker_info.layer_sync_result = global_sync_result
                        logger.info(f"工作线程{worker_info.worker_id}: 通过全局同步点,结果: {global_sync_result}")
                    except threading.BrokenBarrierError:
                        logger.error(f"工作线程{worker_info.worker_id}: 全局同步点损坏")
                        return

                    # 层级3: 汇总任务
                    self._execute_aggregation_tasks(worker_info)

                    logger.info(f"工作线程{worker_info.worker_id}: 所有层级任务完成")

                def _execute_group_tasks(self, worker_info: WorkerInfo):
                    """执行组内任务"""
                    logger.debug(f"工作线程{worker_info.worker_id}: 开始组内任务")

                    for task_id in range(worker_info.task_count):
                        # 模拟任务执行
                        task_duration = random.uniform(0.2, 1.0)
                        time.sleep(task_duration)
                        worker_info.completed_tasks += 1

                        # 根据组别执行不同类型的任务
                        if worker_info.group == ThreadGroup.DATA_COLLECTORS:
                            # 数据收集任务
                            data_size = random.randint(100, 500)
                            self._add_group_result(worker_info.group, f"数据{task_id}", data_size)

                        elif worker_info.group == ThreadGroup.DATA_PROCESSORS:
                            # 数据处理任务
                            processed_count = random.randint(50, 200)
                            self._add_group_result(worker_info.group, f"处理{task_id}", processed_count)

                        elif worker_info.group == ThreadGroup.DATA_VALIDATORS:
                            # 数据验证任务
                            validated_count = random.randint(30, 150)
                            self._add_group_result(worker_info.group, f"验证{task_id}", validated_count)

                        elif worker_info.group == ThreadGroup.RESULT_AGGREGATORS:
                            # 结果汇总任务
                            aggregated_count = random.randint(10, 100)
                            self._add_group_result(worker_info.group, f"汇总{task_id}", aggregated_count)

                def _execute_cross_group_tasks(self, worker_info: WorkerInfo):
                    """执行跨组协作任务"""
                    logger.debug(f"工作线程{worker_info.worker_id}: 开始跨组协作任务")

                    # 根据组别执行不同的协作任务
                    if worker_info.group == ThreadGroup.DATA_COLLECTORS:
                        # 数据收集者与其他组协调
                        coord_time = random.uniform(0.5, 1.5)
                        time.sleep(coord_time)

                    elif worker_info.group == ThreadGroup.DATA_PROCESSORS:
                        # 数据处理器依赖收集者的结果
                        coord_time = random.uniform(1.0, 2.0)
                        time.sleep(coord_time)

                    elif worker_info.group == ThreadGroup.DATA_VALIDATORS:
                        # 数据验证者依赖处理器的结果
                        coord_time = random.uniform(0.8, 1.8)
                        time.sleep(coord_time)

                    elif worker_info.group == ThreadGroup.RESULT_AGGREGATORS:
                        # 结果汇总者依赖所有组的结果
                        coord_time = random.uniform(0.3, 1.2)
                        time.sleep(coord_time)

                def _execute_aggregation_tasks(self, worker_info: WorkerInfo):
                    """执行汇总任务"""
                    logger.debug(f"工作线程{worker_info.worker_id}: 开始汇总任务")

                    # 最后的汇总处理
                    aggregation_time = random.uniform(0.5, 1.0)
                    time.sleep(aggregation_time)

                    # 记录层级结果
                    layer_result = {
                        'worker_id': worker_info.worker_id,
                        'group': worker_info.group.value,
                        'completed_tasks': worker_info.completed_tasks,
                        'group_sync': worker_info.group_sync_result,
                        'layer_sync': worker_info.layer_sync_result
                    }

                    self._add_layer_result(worker_info.layer, layer_result)

                def _add_group_result(self, group: ThreadGroup, result_type: str, value: int):
                    """添加组内结果"""
                    if group not in self.group_results:
                        self.group_results[group] = []
                    self.group_results[group].append({
                        'type': result_type,
                        'value': value,
                        'timestamp': time.time()
                    })

                def _add_layer_result(self, layer: int, result: Dict[str, Any]):
                    """添加层级结果"""
                    if layer not in self.layer_results:
                        self.layer_results[layer] = []
                    self.layer_results[layer].append(result)

                def coordination_monitor(self):
                    """协调监控线程"""
                    logger.info("协调监控线程启动")

                    start_time = time.time()
                    while True:
                        # 检查所有工作线程是否完成
                        completed_workers = sum(1 for w in self.workers
                                            if w.completed_tasks >= w.task_count and
                                            w.layer_sync_result is not None)

                        if completed_workers >= len(self.workers):
                            break

                        # 监控进度
                        progress = completed_workers / len(self.workers) * 100
                        logger.info(f"监控: 总进度 {progress:.1f}% ({completed_workers}/{len(self.workers)})")

                        time.sleep(1)

                    total_time = time.time() - start_time
                    logger.info(f"协调监控线程结束,总耗时: {total_time:.2f}秒")

                def run_hierarchical_demo(self):
                    """运行分层同步演示"""
                    logger.info(f"=== 分层同步点架构演示 ===")
                    logger.info(f"总线程数: {self.total_threads}")

                    # 打印分组信息
                    logger.info("线程分组:")
                    for group, size in self.group_sizes.items():
                        logger.info(f"  {group.value}: {size} 个线程")

                    threads = []
                    demo_start = time.time()

                    # 创建工作线程
                    for worker_info in self.workers:
                        thread = threading.Thread(
                            target=self.hierarchical_worker,
                            args=(worker_info,)
                        )
                        threads.append(thread)
                        thread.start()
                        time.sleep(0.05)  # 错开启动时间

                    # 创建监控线程
                    monitor_thread = threading.Thread(target=self.coordination_monitor)
                    monitor_thread.start()

                    # 等待所有线程完成
                    for thread in threads:
                        thread.join(timeout=30)
                        if thread.is_alive():
                            logger.warning(f"线程 {thread.name} 未能在超时内完成")

                    monitor_thread.join()

                    demo_time = time.time() - demo_start
                    self._analyze_hierarchical_results(demo_time)

                def _analyze_hierarchical_results(self, total_demo_time: float):
                    """分析分层同步结果"""
                    logger.info("\n=== 分层同步结果分析 ===")
                    logger.info(f"演示总耗时: {total_demo_time:.2f}秒")

                    # 按组分析结果
                    logger.info("\n按组结果统计:")
                    for group, results in self.group_results.items():
                        if results:
                            total_value = sum(r['value'] for r in results)
                            avg_value = total_value / len(results)
                            logger.info(f"  {group.value}: {len(results)} 个任务, "
                                    f"总处理量: {total_value}, 平均: {avg_value:.1f}")

                    # 按层级分析结果
                    logger.info("\n按层级结果统计:")
                    for layer, results in self.layer_results.items():
                        if results:
                            logger.info(f"  层级{layer}: {len(results)} 个工作线程")
                            sync_results = [r['layer_sync'] for r in results if r['layer_sync'] is not None]
                            if sync_results:
                                logger.info(f"    全局同步结果分布: {sorted(sync_results)}")

                    # 完成情况分析
                    completed_workers = [w for w in self.workers if w.completed_tasks >= w.task_count]
                    logger.info(f"\n完成情况:")
                    logger.info(f"  完成的工作线程: {len(completed_workers)}/{len(self.workers)}")
                    logger.info(f"  完成率: {len(completed_workers)/len(self.workers)*100:.1f}%")

                    if completed_workers:
                        total_tasks = sum(w.completed_tasks for w in completed_workers)
                        avg_tasks = total_tasks / len(completed_workers)
                        logger.info(f"  平均完成任务数: {avg_tasks:.1f}")

            # 使用示例
            if __name__ == "__main__":
                hierarchical_sync = HierarchicalSyncPoint(total_threads=12)
                hierarchical_sync.run_hierarchical_demo()
            ---

05.动态同步点管理
    a.动态同步需求
        a.线程数量变化
            运行时动态调整参与同步的线程数量。
        b.条件变化
            根据运行时条件动态修改同步行为。
        c.资源优化
            根据系统负载动态调整同步策略。
    b.实现技术
        a.可重配置屏障
            支持运行时重新配置的屏障实现。
        b.动态线程管理
            动态添加或移除参与同步的线程。
        c.代码示例
            ---
            # 动态同步点管理示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any, Optional, Set
            from dataclasses import dataclass
            from queue import Queue
            from concurrent.futures import ThreadPoolExecutor

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            @dataclass
            class DynamicWorker:
                """动态工作线程信息"""
                worker_id: int
                is_active: bool = True
                join_time: float = 0
                completed_tasks: int = 0
                last_sync_time: Optional[float] = None

            class DynamicSyncPoint:
                """动态同步点管理演示"""
                def __init__(self, initial_threads: int = 3):
                    self.initial_threads = initial_threads
                    self.current_workers: Dict[int, DynamicWorker] = {}
                    self.worker_queue = Queue()
                    self.sync_lock = threading.Lock()
                    self.condition = threading.Condition(self.sync_lock)

                    # 动态屏障管理
                    self.current_barrier: Optional[threading.Barrier] = None
                    self.required_parties = initial_threads
                    self.sync_counter = 0

                    # 管理统计
                    self.join_events: List[Dict[str, Any]] = []
                    self.leave_events: List[Dict[str, Any]] = []
                    self.sync_events: List[Dict[str, Any]] = []

                def initialize_system(self):
                    """初始化动态同步系统"""
                    logger.info(f"初始化动态同步系统,初始线程数: {self.initial_threads}")

                    # 创建初始屏障
                    self._recreate_barrier()

                    # 创建初始工作线程
                    for i in range(1, self.initial_threads + 1):
                        worker = DynamicWorker(worker_id=i, join_time=time.time())
                        self.current_workers[i] = worker

                    logger.info(f"系统初始化完成,当前活跃线程数: {len(self.current_workers)}")

                def _recreate_barrier(self):
                    """重新创建屏障"""
                    with self.sync_lock:
                        if self.current_barrier:
                            logger.debug(f"销毁旧屏障 (参与数: {self.current_barrier.parties})")

                        # 尝试优雅关闭旧屏障
                        try:
                            # 如果还有线程在等待,通知它们
                            self.condition.notify_all()
                        except:
                            pass

                        time.sleep(0.1)  # 等待线程清理

                    # 创建新屏障
                    self.current_barrier = threading.Barrier(parties=self.required_parties)
                    self.sync_counter += 1
                    logger.info(f"创建新屏障 #{self.sync_counter},参与数: {self.required_parties}")

                def dynamic_worker(self, worker_id: int):
                    """动态工作线程"""
                    worker = self.current_workers.get(worker_id)
                    if not worker:
                        logger.error(f"工作线程 {worker_id}: 工作线程信息不存在")
                        return

                    logger.info(f"动态工作线程 {worker_id}: 启动")

                    while worker.is_active:
                        try:
                            # 执行工作任务
                            self._execute_work_task(worker)

                            # 检查是否需要同步
                            if self._should_sync(worker):
                                self._perform_dynamic_sync(worker)

                            # 随机休息时间
                            rest_time = random.uniform(0.5, 2.0)
                            time.sleep(rest_time)

                        except Exception as e:
                            logger.error(f"动态工作线程 {worker_id}: 执行异常 - {e}")
                            break

                    logger.info(f"动态工作线程 {worker_id}: 退出")

                def _execute_work_task(self, worker: DynamicWorker):
                    """执行工作任务"""
                    task_duration = random.uniform(0.3, 1.5)
                    time.sleep(task_duration)
                    worker.completed_tasks += 1
                    logger.debug(f"工作线程 {worker.worker_id}: 完成第 {worker.completed_tasks} 个任务")

                def _should_sync(self, worker: DynamicWorker) -> bool:
                    """判断是否需要同步"""
                    # 每完成3个任务同步一次
                    return worker.completed_tasks % 3 == 0

                def _perform_dynamic_sync(self, worker: DynamicWorker):
                    """执行动态同步"""
                    sync_start = time.time()
                    logger.info(f"工作线程 {worker.worker_id}: 开始同步 #{self.sync_counter}")

                    with self.sync_lock:
                        current_barrier = self.current_barrier

                    try:
                        # 等待同步
                        wait_result = current_barrier.wait(timeout=5.0)
                        sync_duration = time.time() - sync_start
                        worker.last_sync_time = sync_start + sync_duration

                        # 记录同步事件
                        sync_event = {
                            'sync_id': self.sync_counter,
                            'worker_id': worker.worker_id,
                            'wait_result': wait_result,
                            'duration': sync_duration,
                            'participants': self.required_parties,
                            'timestamp': time.time()
                        }
                        self.sync_events.append(sync_event)

                        logger.info(f"工作线程 {worker.worker_id}: 同步完成,结果: {wait_result} "
                                f"(耗时: {sync_duration:.2f}s)")

                    except threading.BrokenBarrierError:
                        logger.warning(f"工作线程 {worker.worker_id}: 同步点损坏")
                    except Exception as e:
                        logger.error(f"工作线程 {worker.worker_id}: 同步异常 - {e}")

                def thread_manager(self):
                    """线程管理器 - 动态添加和移除线程"""
                    logger.info("线程管理器启动")

                    manager_cycles = 0
                    max_cycles = 10

                    while manager_cycles < max_cycles:
                        manager_cycles += 1

                        try:
                            # 随机决定是否调整线程数量
                            if random.random() < 0.6:  # 60%概率调整
                                action = random.choice(['add', 'remove'])

                                if action == 'add':
                                    self._add_random_worker()
                                else:
                                    self._remove_random_worker()

                            # 等待一段时间再进行下一次调整
                            time.sleep(random.uniform(3.0, 8.0))

                        except Exception as e:
                            logger.error(f"线程管理器异常 - {e}")

                    logger.info("线程管理器完成")

                def _add_random_worker(self):
                    """随机添加工作线程"""
                    current_count = len([w for w in self.current_workers.values() if w.is_active])
                    max_workers = 8

                    if current_count >= max_workers:
                        logger.debug("已达到最大线程数,跳过添加")
                        return

                    # 找到新的线程ID
                    new_id = max(self.current_workers.keys(), default=0) + 1
                    if new_id > 100:  # 防止ID过大
                        new_id = 1
                        while new_id in self.current_workers:
                            new_id += 1

                    # 添加新工作线程
                    new_worker = DynamicWorker(worker_id=new_id, join_time=time.time())
                    self.current_workers[new_id] = new_worker

                    # 增加所需参与数
                    self.required_parties += 1
                    self._recreate_barrier()

                    # 启动新线程
                    new_thread = threading.Thread(
                        target=self.dynamic_worker,
                        args=(new_id,),
                        name=f"DynamicWorker-{new_id}",
                        daemon=True
                    )
                    new_thread.start()

                    # 记录加入事件
                    join_event = {
                        'worker_id': new_id,
                        'action': 'join',
                        'timestamp': time.time(),
                        'total_workers': self.required_parties
                    }
                    self.join_events.append(join_event)

                    logger.info(f"动态添加工作线程 {new_id},当前总线程数: {self.required_parties}")

                def _remove_random_worker(self):
                    """随机移除工作线程"""
                    active_workers = [w for w in self.current_workers.values() if w.is_active]
                    min_workers = 2

                    if len(active_workers) <= min_workers:
                        logger.debug("已达到最小线程数,跳过移除")
                        return

                    # 随机选择一个要移除的线程
                    worker_to_remove = random.choice(active_workers)
                    worker_to_remove.is_active = False

                    # 减少所需参与数
                    self.required_parties -= 1
                    self._recreate_barrier()

                    # 记录离开事件
                    leave_event = {
                        'worker_id': worker_to_remove.worker_id,
                        'action': 'leave',
                        'timestamp': time.time(),
                        'total_workers': self.required_parties,
                        'completed_tasks': worker_to_remove.completed_tasks
                    }
                    self.leave_events.append(leave_event)

                    logger.info(f"动态移除工作线程 {worker_to_remove.worker_id},当前总线程数: {self.required_parties}")

                def performance_monitor(self):
                    """性能监控线程"""
                    logger.info("性能监控线程启动")

                    start_time = time.time()
                    while True:
                        # 收集当前性能指标
                        with self.sync_lock:
                            current_stats = self._collect_performance_stats()

                        logger.info(f"性能监控: {current_stats}")

                        # 检查是否应该停止监控
                        elapsed = time.time() - start_time
                        if elapsed > 30:  # 监控30秒
                            break

                        time.sleep(2)

                    logger.info("性能监控线程结束")

                def _collect_performance_stats(self) -> Dict[str, Any]:
                    """收集性能统计信息"""
                    active_workers = [w for w in self.current_workers.values() if w.is_active]
                    completed_tasks = sum(w.completed_tasks for w in self.current_workers.values())

                    recent_syncs = [s for s in self.sync_events
                                if time.time() - s['timestamp'] < 10]
                    avg_sync_time = sum(s['duration'] for s in recent_syncs) / len(recent_syncs) if recent_syncs else 0

                    return {
                        'active_workers': len(active_workers),
                        'total_workers': len(self.current_workers),
                        'completed_tasks': completed_tasks,
                        'sync_count': len(self.sync_events),
                        'avg_sync_time': avg_sync_time,
                        'required_parties': self.required_parties
                    }

                def run_dynamic_sync_demo(self):
                    """运行动态同步演示"""
                    logger.info("=== 动态同步点管理演示 ===")

                    # 初始化系统
                    self.initialize_system()

                    threads = []
                    demo_start = time.time()

                    try:
                        # 启动初始工作线程
                        with ThreadPoolExecutor(max_workers=self.initial_threads) as executor:
                            # 提交初始工作任务
                            futures = [
                                executor.submit(self.dynamic_worker, worker_id)
                                for worker_id in range(1, self.initial_threads + 1)
                            ]

                            # 启动线程管理器
                            manager_thread = threading.Thread(target=self.thread_manager, daemon=True)
                            manager_thread.start()

                            # 启动性能监控
                            monitor_thread = threading.Thread(target=self.performance_monitor, daemon=True)
                            monitor_thread.start()

                            # 等待工作线程完成(超时控制)
                            for future in futures:
                                try:
                                    future.result(timeout=20)
                                except Exception as e:
                                    logger.error(f"工作线程异常: {e}")

                        # 等待管理线程完成
                        time.sleep(2)

                    except KeyboardInterrupt:
                        logger.info("收到中断信号,正在清理...")
                    finally:
                        # 清理资源
                        self._cleanup()

                    demo_time = time.time() - demo_start
                    self._analyze_dynamic_results(demo_time)

                def _cleanup(self):
                    """清理资源"""
                    logger.info("清理系统资源...")

                    # 停止所有工作线程
                    for worker in self.current_workers.values():
                        worker.is_active = False

                    # 等待线程退出
                    time.sleep(1)

                def _analyze_dynamic_results(self, total_demo_time: float):
                    """分析动态同步结果"""
                    logger.info("\n=== 动态同步结果分析 ===")
                    logger.info(f"演示总耗时: {total_demo_time:.2f}秒")

                    # 线程变化统计
                    logger.info(f"\n线程动态变化:")
                    logger.info(f"  初始线程数: {self.initial_threads}")
                    logger.info(f"  最终线程数: {self.required_parties}")
                    logger.info(f"  加入事件: {len(self.join_events)} 次")
                    logger.info(f"  离开事件: {len(self.leave_events)} 次")

                    # 任务完成统计
                    total_tasks = sum(w.completed_tasks for w in self.current_workers.values())
                    logger.info(f"\n任务完成情况:")
                    logger.info(f"  总完成任务数: {total_tasks}")
                    logger.info(f"  平均每线程完成: {total_tasks/len(self.current_workers):.1f} 个")

                    # 同步性能统计
                    if self.sync_events:
                        sync_times = [s['duration'] for s in self.sync_events]
                        avg_sync_time = sum(sync_times) / len(sync_times)
                        max_sync_time = max(sync_times)
                        min_sync_time = min(sync_times)

                        logger.info(f"\n同步性能:")
                        logger.info(f"  总同步次数: {len(self.sync_events)}")
                        logger.info(f"  平均同步时间: {avg_sync_time:.3f}秒")
                        logger.info(f"  最长同步时间: {max_sync_time:.3f}秒")
                        logger.info(f"  最短同步时间: {min_sync_time:.3f}秒")

                    # 效率分析
                    if total_demo_time > 0:
                        tasks_per_second = total_tasks / total_demo_time
                        syncs_per_second = len(self.sync_events) / total_demo_time
                        logger.info(f"\n效率指标:")
                        logger.info(f"  任务处理速度: {tasks_per_second:.2f} 个/秒")
                        logger.info(f"  同步频率: {syncs_per_second:.2f} 次/秒")

            # 使用示例
            if __name__ == "__main__":
                dynamic_sync = DynamicSyncPoint(initial_threads=3)
                dynamic_sync.run_dynamic_sync_demo()
            ---

06.同步点性能优化
    a.性能分析要点
        a.同步开销评估
            分析同步操作对系统性能的具体影响。
        b.等待时间优化
            通过负载均衡减少线程等待时间。
        c.资源利用率
            最大化系统资源的有效利用。
    b.优化策略
        a.算法优化
            使用更高效的同步算法和数据结构。
        b.参数调优
            根据系统特征调整同步参数。
        c.代码示例
            ---
            # 同步点性能优化示例
            import threading
            import time
            import logging
            import random
            from typing import List, Dict, Any, Optional, Callable
            from dataclasses import dataclass
            from concurrent.futures import ThreadPoolExecutor, as_completed
            import statistics
            import queue

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            @dataclass
            class PerformanceMetrics:
                """性能指标"""
                thread_id: int
                sync_point: int
                arrival_time: float
                wait_time: float
                total_time: float
                work_time: float
                efficiency: float

            class SyncPointOptimizer:
                """同步点性能优化演示"""
                def __init__(self, num_threads: int = 6):
                    self.num_threads = num_threads
                    self.performance_metrics: List[PerformanceMetrics] = []
                    self.optimization_strategies = {
                        'baseline': self._baseline_sync,
                        'load_balanced': self._load_balanced_sync,
                        'adaptive_timeout': self._adaptive_timeout_sync,
                        'batch_processing': self._batch_processing_sync
                    }

                def _baseline_sync(self, worker_id: int, work_load: float) -> PerformanceMetrics:
                    """基准同步策略 - 无优化"""
                    start_time = time.time()
                    barrier = threading.Barrier(parties=self.num_threads)

                    # 执行工作
                    work_start = time.time()
                    time.sleep(work_load)
                    work_time = time.time() - work_start

                    # 同步等待
                    sync_start = time.time()
                    wait_result = barrier.wait(timeout=10.0)
                    sync_duration = time.time() - sync_start

                    total_time = time.time() - start_time
                    efficiency = work_time / total_time if total_time > 0 else 0

                    return PerformanceMetrics(
                        thread_id=worker_id,
                        sync_point=1,
                        arrival_time=sync_start,
                        wait_time=sync_duration,
                        total_time=total_time,
                        work_time=work_time,
                        efficiency=efficiency
                    )

                def _load_balanced_sync(self, worker_id: int, work_load: float) -> PerformanceMetrics:
                    """负载均衡同步策略"""
                    start_time = time.time()

                    # 根据线程ID动态调整工作量
                    adjusted_work = work_load * (0.7 + 0.6 * (worker_id / self.num_threads))
                    barrier = threading.Barrier(parties=self.num_threads)

                    # 执行平衡后的工作
                    work_start = time.time()
                    time.sleep(adjusted_work)
                    work_time = time.time() - work_start

                    # 延迟到达同步点以减少等待时间
                    delay = max(0, (self.num_threads - worker_id) * 0.1)
                    time.sleep(delay)

                    # 同步等待
                    sync_start = time.time()
                    wait_result = barrier.wait(timeout=8.0)
                    sync_duration = time.time() - sync_start

                    total_time = time.time() - start_time
                    efficiency = work_time / total_time if total_time > 0 else 0

                    return PerformanceMetrics(
                        thread_id=worker_id,
                        sync_point=2,
                        arrival_time=sync_start,
                        wait_time=sync_duration,
                        total_time=total_time,
                        work_time=work_time,
                        efficiency=efficiency
                    )

                def _adaptive_timeout_sync(self, worker_id: int, work_load: float) -> PerformanceMetrics:
                    """自适应超时同步策略"""
                    start_time = time.time()
                    barrier = threading.Barrier(parties=self.num_threads)

                    # 执行工作
                    work_start = time.time()
                    time.sleep(work_load)
                    work_time = time.time() - work_start

                    # 根据工作量动态调整超时时间
                    adaptive_timeout = max(2.0, work_load * 2)

                    # 同步等待
                    sync_start = time.time()
                    try:
                        wait_result = barrier.wait(timeout=adaptive_timeout)
                        sync_duration = time.time() - sync_start
                    except Exception:
                        sync_duration = adaptive_timeout  # 超时时间

                    total_time = time.time() - start_time
                    efficiency = work_time / total_time if total_time > 0 else 0

                    return PerformanceMetrics(
                        thread_id=worker_id,
                        sync_point=3,
                        arrival_time=sync_start,
                        wait_time=sync_duration,
                        total_time=total_time,
                        work_time=work_time,
                        efficiency=efficiency
                    )

                def _batch_processing_sync(self, worker_id: int, work_load: float) -> PerformanceMetrics:
                    """批处理同步策略"""
                    start_time = time.time()

                    # 将工作分解为小批次
                    batch_size = 0.2
                    num_batches = int(work_load / batch_size)
                    remaining_work = work_load % batch_size

                    work_time = 0
                    for i in range(num_batches):
                        batch_start = time.time()
                        time.sleep(batch_size)
                        work_time += batch_size

                        # 每个批次后进行轻量级同步
                        if i % 2 == 0 and i > 0:
                            light_barrier = threading.Barrier(parties=self.num_threads)
                            try:
                                light_barrier.wait(timeout=1.0)
                            except:
                                pass

                    if remaining_work > 0:
                        time.sleep(remaining_work)
                        work_time += remaining_work

                    # 最终同步点
                    final_barrier = threading.Barrier(parties=self.num_threads)
                    sync_start = time.time()
                    try:
                        wait_result = final_barrier.wait(timeout=5.0)
                        sync_duration = time.time() - sync_start
                    except Exception:
                        sync_duration = 5.0

                    total_time = time.time() - start_time
                    efficiency = work_time / total_time if total_time > 0 else 0

                    return PerformanceMetrics(
                        thread_id=worker_id,
                        sync_point=4,
                        arrival_time=sync_start,
                        wait_time=sync_duration,
                        total_time=total_time,
                        work_time=work_time,
                        efficiency=efficiency
                    )

                def benchmark_strategy(self, strategy_name: str,
                                    strategy_func: Callable[[int, float], PerformanceMetrics],
                                    work_loads: List[float]) -> Dict[str, Any]:
                    """基准测试特定策略"""
                    logger.info(f"基准测试策略: {strategy_name}")
                    strategy_start = time.time()

                    metrics_list = []
                    with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
                        # 提交工作任务
                        futures = [
                            executor.submit(strategy_func, i + 1, work_loads[i])
                            for i in range(min(len(work_loads), self.num_threads))
                        ]

                        # 收集结果
                        for future in as_completed(futures):
                            try:
                                metrics = future.result(timeout=30)
                                metrics_list.append(metrics)
                            except Exception as e:
                                logger.error(f"策略 {strategy_name} 执行异常: {e}")

                    strategy_time = time.time() - strategy_start
                    self.performance_metrics.extend(metrics_list)

                    # 分析性能
                    return self._analyze_strategy_performance(strategy_name, metrics_list, strategy_time)

                def _analyze_strategy_performance(self, strategy_name: str,
                                                metrics_list: List[PerformanceMetrics],
                                                total_time: float) -> Dict[str, Any]:
                    """分析策略性能"""
                    if not metrics_list:
                        return {'error': 'No valid metrics'}

                    wait_times = [m.wait_time for m in metrics_list]
                    total_times = [m.total_time for m in metrics_list]
                    work_times = [m.work_time for m in metrics_list]
                    efficiencies = [m.efficiency for m in metrics_list]

                    # 检查同步时间一致性
                    arrival_times = [m.arrival_time for m in metrics_list]
                    max_arrival = max(arrival_times)
                    min_arrival = min(arrival_times)
                    arrival_span = max_arrival - min_arrival

                    analysis = {
                        'strategy': strategy_name,
                        'threads_completed': len(metrics_list),
                        'total_time': total_time,
                        'wait_time_stats': {
                            'mean': statistics.mean(wait_times),
                            'median': statistics.median(wait_times),
                            'stdev': statistics.stdev(wait_times) if len(wait_times) > 1 else 0,
                            'min': min(wait_times),
                            'max': max(wait_times)
                        },
                        'efficiency_stats': {
                            'mean': statistics.mean(efficiencies),
                            'median': statistics.median(efficiencies),
                            'stdev': statistics.stdev(efficiencies) if len(efficiencies) > 1 else 0,
                            'min': min(efficiencies),
                            'max': max(efficiencies)
                        },
                        'synchronization_quality': {
                            'arrival_span': arrival_span,
                            'wait_consistency': statistics.stdev(wait_times) if len(wait_times) > 1 else 0,
                            'synchronization_overhead': sum(wait_times) / len(wait_times)
                        }
                    }

                    logger.info(f"{strategy_name} 性能:")
                    logger.info(f"  平均等待时间: {analysis['wait_time_stats']['mean']:.3f}s")
                    logger.info(f"  平均效率: {analysis['efficiency_stats']['mean']:.3f}")
                    logger.info(f"  到达时间跨度: {analysis['synchronization_quality']['arrival_span']:.3f}s")

                    return analysis

                def run_optimization_benchmark(self):
                    """运行优化基准测试"""
                    logger.info(f"=== 同步点性能优化基准测试 ===")
                    logger.info(f"线程数: {self.num_threads}")

                    # 定义不同的工作负载模式
                    work_loads = [
                        [2.0, 2.5, 1.8, 3.0, 2.2, 2.8],  # 不均匀负载
                        [1.5, 1.5, 1.5, 1.5, 1.5, 1.5],  # 均匀负载
                        [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],  # 递增负载
                        [3.0, 1.0, 4.0, 2.0, 5.0, 1.5]   # 随机负载
                    ]

                    results = {}
                    test_count = 0

                    for i, loads in enumerate(work_loads, 1):
                        logger.info(f"\n--- 测试场景 {i}: {self._describe_load_pattern(loads)} ---")

                        scenario_results = {}

                        for strategy_name, strategy_func in self.optimization_strategies.items():
                            # 清理之前的指标
                            self.performance_metrics = []

                            # 执行基准测试
                            result = self.benchmark_strategy(strategy_name, strategy_func, loads)
                            scenario_results[strategy_name] = result

                            test_count += 1

                            # 测试间休息
                            time.sleep(1)

                        results[f"scenario_{i}"] = {
                            'description': self._describe_load_pattern(loads),
                            'work_loads': loads,
                            'results': scenario_results
                        }

                    # 分析整体结果
                    self._analyze_benchmark_results(results)

                    return results

                def _describe_load_pattern(self, loads: List[float]) -> str:
                    """描述负载模式"""
                    if len(set(round(l, 1) for l in loads)) == 1:
                        return "均匀负载"
                    elif all(loads[i] <= loads[i+1] for i in range(len(loads)-1)):
                        return "递增负载"
                    elif max(loads) / min(loads) < 1.5:
                        return "轻度不均匀负载"
                    else:
                        return "高度不均匀负载"

                def _analyze_benchmark_results(self, results: Dict[str, Any]):
                    """分析基准测试结果"""
                    logger.info("\n=== 优化基准测试结果分析 ===")

                    # 按策略汇总结果
                    strategy_summary = {}
                    for strategy_name in self.optimization_strategies.keys():
                        efficiencies = []
                        wait_times = []
                        sync_overheads = []

                        for scenario in results.values():
                            if strategy_name in scenario['results']:
                                result = scenario['results'][strategy_name]
                                if 'efficiency_stats' in result:
                                    efficiencies.append(result['efficiency_stats']['mean'])
                                if 'wait_time_stats' in result:
                                    wait_times.append(result['wait_time_stats']['mean'])
                                if 'synchronization_quality' in result:
                                    sync_overheads.append(result['synchronization_quality']['synchronization_overhead'])

                        if efficiencies:
                            strategy_summary[strategy_name] = {
                                'avg_efficiency': statistics.mean(efficiencies),
                                'avg_wait_time': statistics.mean(wait_times) if wait_times else 0,
                                'avg_sync_overhead': statistics.mean(sync_overheads) if sync_overheads else 0,
                                'scenarios_tested': len(efficiencies)
                            }

                    # 排序和输出结果
                    logger.info("\n策略性能排名 (按平均效率):")
                    sorted_strategies = sorted(strategy_summary.items(),
                                            key=lambda x: x[1]['avg_efficiency'],
                                            reverse=True)

                    for i, (strategy, stats) in enumerate(sorted_strategies, 1):
                        logger.info(f"  {i}. {strategy}:")
                        logger.info(f"     平均效率: {stats['avg_efficiency']:.3f}")
                        logger.info(f"     平均等待时间: {stats['avg_wait_time']:.3f}s")
                        logger.info(f"     同步开销: {stats['avg_sync_overhead']:.3f}s")
                        logger.info(f"     测试场景数: {stats['scenarios_tested']}")

                    # 优化建议
                    best_strategy = sorted_strategies[0][0] if sorted_strategies else None
                    logger.info(f"\n推荐策略: {best_strategy}")
                    logger.info("\n优化建议:")
                    logger.info("  1. 使用负载均衡减少线程到达时间差异")
                    logger.info("  2. 根据工作负载动态调整超时参数")
                    logger.info("  3. 考虑批处理减少同步频率")
                    logger.info("  4. 监控同步等待时间的标准差")

            # 使用示例
            if __name__ == "__main__":
                optimizer = SyncPointOptimizer(num_threads=6)
                optimizer.run_optimization_benchmark()
            ---

6.3 分阶段执行

01.分阶段执行模式
    a.基本概念
        分阶段执行是指将复杂的任务分解为多个独立的阶段,每个阶段完成后在同步点等待所有线程完成,然后再进入下一阶段。这种模式可以确保任务的有序执行和数据的正确传递。
    b.核心特点
        a.阶段性划分
            将复杂的处理流程按照逻辑或时间顺序划分为独立的执行阶段。
        b.同步点控制
            每个阶段结束时设置同步点,确保所有线程完成当前阶段才能进入下一阶段。
        c.状态传递
            前一阶段的输出作为后一阶段的输入,实现数据的有序流转。
    c.代码示例
        ---
        # 分阶段执行基础示例
        import threading
        import time
        import logging
        import random
        from typing import List, Dict, Any

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class PhasedExecution:
            """分阶段执行演示"""
            def __init__(self, num_threads: int = 4, num_phases: int = 3):
                self.num_threads = num_threads
                self.num_phases = num_phases
                # 为每个阶段创建独立的屏障
                self.barriers = [threading.Barrier(parties=num_threads)
                               for _ in range(num_phases)]
                self.results: Dict[int, List[Dict[str, Any]]] = {}
                for phase in range(num_phases):
                    self.results[phase] = []

            def phased_worker(self, worker_id: int):
                """分阶段工作线程"""
                logger.info(f"工作线程 {worker_id}: 启动分阶段执行")

                for phase in range(self.num_phases):
                    phase_start = time.time()
                    logger.info(f"工作线程 {worker_id}: 开始阶段 {phase + 1}")

                    # 执行阶段任务
                    if phase == 0:
                        # 阶段1:数据收集
                        task_result = self._execute_data_collection(worker_id)
                    elif phase == 1:
                        # 阶段2:数据处理
                        task_result = self._execute_data_processing(worker_id)
                    else:
                        # 阶段3:结果汇总
                        task_result = self._execute_result_aggregation(worker_id)

                    phase_duration = time.time() - phase_start

                    # 记录阶段结果
                    result = {
                        'worker_id': worker_id,
                        'phase': phase + 1,
                        'duration': phase_duration,
                        'task_result': task_result
                    }
                    self.results[phase].append(result)

                    logger.info(f"工作线程 {worker_id}: 阶段 {phase + 1} 完成,"
                              f"耗时: {phase_duration:.2f}s")

                    # 同步点:等待所有线程完成当前阶段
                    sync_start = time.time()
                    try:
                        sync_result = self.barriers[phase].wait(timeout=10.0)
                        sync_duration = time.time() - sync_start

                        logger.info(f"工作线程 {worker_id}: 通过阶段 {phase + 1} 同步点,"
                                  f"同步结果: {sync_result}, 等待时间: {sync_duration:.2f}s")

                    except threading.BrokenBarrierError:
                        logger.error(f"工作线程 {worker_id}: 阶段 {phase + 1} 同步点损坏")
                        return
                    except Exception as e:
                        logger.error(f"工作线程 {worker_id}: 阶段 {phase + 1} 同步异常: {e}")
                        return

                logger.info(f"工作线程 {worker_id}: 所有阶段完成")

            def _execute_data_collection(self, worker_id: int) -> Dict[str, Any]:
                """执行数据收集阶段"""
                # 模拟数据收集工作
                collection_time = random.uniform(1.0, 3.0)
                time.sleep(collection_time)

                # 生成模拟数据
                data_items = random.randint(50, 200)
                data_size = random.randint(1000, 5000)

                logger.debug(f"工作线程 {worker_id}: 收集了 {data_items} 项数据,"
                           f"总大小 {data_size} 字节")

                return {
                    'items_collected': data_items,
                    'data_size': data_size,
                    'collection_time': collection_time
                }

            def _execute_data_processing(self, worker_id: int) -> Dict[str, Any]:
                """执行数据处理阶段"""
                # 获取上一阶段的数据
                prev_result = self.results[0][worker_id - 1] if worker_id - 1 < len(self.results[0]) else {}
                data_items = prev_result.get('task_result', {}).get('items_collected', 100)

                # 模拟数据处理工作
                processing_time = random.uniform(0.5, 2.0)
                time.sleep(processing_time)

                # 处理结果
                processed_items = int(data_items * random.uniform(0.8, 0.95))
                error_count = random.randint(0, 5)

                logger.debug(f"工作线程 {worker_id}: 处理了 {processed_items} 项数据,"
                           f"错误数: {error_count}")

                return {
                    'items_processed': processed_items,
                    'error_count': error_count,
                    'processing_time': processing_time
                }

            def _execute_result_aggregation(self, worker_id: int) -> Dict[str, Any]:
                """执行结果汇总阶段"""
                # 获取前两个阶段的数据
                phase1_result = self.results[0][worker_id - 1] if worker_id - 1 < len(self.results[0]) else {}
                phase2_result = self.results[1][worker_id - 1] if worker_id - 1 < len(self.results[1]) else {}

                items_collected = phase1_result.get('task_result', {}).get('items_collected', 100)
                items_processed = phase2_result.get('task_result', {}).get('items_processed', 90)

                # 模拟结果汇总工作
                aggregation_time = random.uniform(0.3, 1.0)
                time.sleep(aggregation_time)

                # 汇总结果
                success_rate = (items_processed / items_collected) * 100 if items_collected > 0 else 0
                final_score = success_rate * random.uniform(0.9, 1.1)

                logger.debug(f"工作线程 {worker_id}: 成功率: {success_rate:.1f}%, "
                           f"最终得分: {final_score:.1f}")

                return {
                    'success_rate': success_rate,
                    'final_score': final_score,
                    'aggregation_time': aggregation_time
                }

            def run_phased_execution_demo(self):
                """运行分阶段执行演示"""
                logger.info("=== 分阶段执行模式演示 ===")
                logger.info(f"工作线程数: {self.num_threads}")
                logger.info(f"执行阶段数: {self.num_phases}")

                threads = []
                demo_start = time.time()

                # 创建并启动工作线程
                for i in range(self.num_threads):
                    thread = threading.Thread(
                        target=self.phased_worker,
                        args=(i + 1,),
                        name=f"PhaseWorker-{i + 1}"
                    )
                    threads.append(thread)
                    thread.start()
                    time.sleep(0.1)  # 稍微错开启动时间

                # 等待所有线程完成
                for thread in threads:
                    thread.join(timeout=30)
                    if thread.is_alive():
                        logger.warning(f"线程 {thread.name} 未能在超时内完成")

                demo_time = time.time() - demo_start
                self._analyze_phased_results(demo_time)

            def _analyze_phased_results(self, total_demo_time: float):
                """分析分阶段执行结果"""
                logger.info("\n=== 分阶段执行结果分析 ===")
                logger.info(f"演示总耗时: {total_demo_time:.2f}秒")

                # 分析各阶段结果
                for phase in range(self.num_phases):
                    phase_results = self.results[phase]
                    if not phase_results:
                        continue

                    phase_durations = [r['duration'] for r in phase_results]
                    avg_duration = sum(phase_durations) / len(phase_durations)

                    logger.info(f"\n阶段 {phase + 1}:")
                    logger.info(f"  完成线程数: {len(phase_results)}/{self.num_threads}")
                    logger.info(f"  平均执行时间: {avg_duration:.2f}秒")

                    # 分析具体阶段结果
                    if phase == 0:
                        # 数据收集阶段分析
                        total_items = sum(r['task_result'].get('items_collected', 0)
                                        for r in phase_results)
                        total_size = sum(r['task_result'].get('data_size', 0)
                                       for r in phase_results)
                        logger.info(f"  总收集数据项: {total_items}")
                        logger.info(f"  总数据大小: {total_size} 字节")

                    elif phase == 1:
                        # 数据处理阶段分析
                        total_processed = sum(r['task_result'].get('items_processed', 0)
                                            for r in phase_results)
                        total_errors = sum(r['task_result'].get('error_count', 0)
                                         for r in phase_results)
                        logger.info(f"  总处理数据项: {total_processed}")
                        logger.info(f"  总错误数: {total_errors}")

                    else:
                        # 结果汇总阶段分析
                        avg_success_rate = sum(r['task_result'].get('success_rate', 0)
                                             for r in phase_results) / len(phase_results)
                        avg_final_score = sum(r['task_result'].get('final_score', 0)
                                            for r in phase_results) / len(phase_results)
                        logger.info(f"  平均成功率: {avg_success_rate:.1f}%")
                        logger.info(f"  平均最终得分: {avg_final_score:.1f}")

        # 使用示例
        if __name__ == "__main__":
            phased_executor = PhasedExecution(num_threads=4, num_phases=3)
            phased_executor.run_phased_execution_demo()
        ---

02.复杂分阶段模式
    a.依赖关系处理
        a.阶段间依赖
            某些阶段可能依赖前序阶段的特定结果,需要处理依赖关系的建立和验证。
        b.条件分支
            根据前序阶段的执行结果,决定后续阶段的执行路径或参数配置。
        c.错误传播
            当某个阶段出现错误时,需要决定是中止后续阶段还是继续执行。
    b.资源管理
        a.阶段资源分配
            为不同阶段分配和释放所需的资源,如内存、文件句柄、网络连接等。
        b.状态持久化
            在关键阶段完成后保存中间状态,支持故障恢复和断点续传。
        c.负载均衡
            在阶段间进行负载重分配,优化整体执行效率。
    c.代码示例
        ---
        # 复杂分阶段执行示例
        import threading
        import time
        import logging
        import random
        import json
        from typing import List, Dict, Any, Optional, Callable
        from dataclasses import dataclass, asdict
        from enum import Enum
        from queue import Queue, PriorityQueue

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class PhaseStatus(Enum):
            """阶段执行状态"""
            PENDING = "待执行"
            RUNNING = "执行中"
            COMPLETED = "已完成"
            FAILED = "失败"
            SKIPPED = "跳过"

        @dataclass
        class PhaseResult:
            """阶段执行结果"""
            phase_id: int
            worker_id: int
            status: PhaseStatus
            start_time: float
            end_time: float
            result_data: Dict[str, Any]
            error_message: Optional[str] = None
            dependencies: List[int] = None

        class ComplexPhasedExecution:
            """复杂分阶段执行演示"""
            def __init__(self, num_workers: int = 6):
                self.num_workers = num_workers
                self.phase_results: List[PhaseResult] = []
                self.result_lock = threading.Lock()
                self.shared_data: Dict[int, Any] = {}
                self.shared_lock = threading.Lock()

                # 阶段配置
                self.phases = [
                    {'id': 1, 'name': '初始化', 'dependencies': [], 'critical': True},
                    {'id': 2, 'name': '数据加载', 'dependencies': [1], 'critical': True},
                    {'id': 3, 'name': '预处理', 'dependencies': [2], 'critical': False},
                    {'id': 4, 'name': '核心处理', 'dependencies': [2, 3], 'critical': True},
                    {'id': 5, 'name': '后处理', 'dependencies': [4], 'critical': False},
                    {'id': 6, 'name': '结果输出', 'dependencies': [4, 5], 'critical': True}
                ]

                # 为每个阶段创建屏障
                self.barriers = {}
                for phase in self.phases:
                    self.barriers[phase['id']] = threading.Barrier(parties=num_workers)

                # 错误处理队列
                self.error_queue = Queue()

            def complex_phase_worker(self, worker_id: int):
                """复杂分阶段工作线程"""
                logger.info(f"工作线程 {worker_id}: 启动复杂分阶段执行")

                for phase in self.phases:
                    phase_id = phase['id']
                    phase_name = phase['name']

                    try:
                        # 检查依赖关系
                        if not self._check_dependencies(phase['dependencies'], worker_id):
                            logger.warning(f"工作线程 {worker_id}: 阶段 {phase_id}({phase_name}) "
                                         "依赖未满足,跳过执行")
                            self._record_phase_result(
                                phase_id, worker_id, PhaseStatus.SKIPPED,
                                time.time(), time.time(), {}, "依赖未满足"
                            )
                            continue

                        phase_start = time.time()
                        logger.info(f"工作线程 {worker_id}: 开始阶段 {phase_id}({phase_name})")

                        # 执行阶段任务
                        result_data = self._execute_complex_phase(phase_id, worker_id)

                        phase_end = time.time()

                        # 记录成功结果
                        self._record_phase_result(
                            phase_id, worker_id, PhaseStatus.COMPLETED,
                            phase_start, phase_end, result_data
                        )

                        logger.info(f"工作线程 {worker_id}: 阶段 {phase_id}({phase_name}) 完成")

                        # 同步点等待
                        sync_start = time.time()
                        try:
                            sync_result = self.barriers[phase_id].wait(timeout=15.0)
                            sync_duration = time.time() - sync_start

                            logger.debug(f"工作线程 {worker_id}: 阶段 {phase_id} 同步完成,"
                                       f"等待时间: {sync_duration:.2f}s")

                        except threading.BrokenBarrierError:
                            error_msg = f"工作线程 {worker_id}: 阶段 {phase_id} 同步点损坏"
                            logger.error(error_msg)
                            self.error_queue.put(('BARRIER_BROKEN', phase_id, worker_id, error_msg))
                            return
                        except Exception as e:
                            error_msg = f"工作线程 {worker_id}: 阶段 {phase_id} 同步异常: {e}"
                            logger.error(error_msg)
                            self.error_queue.put(('SYNC_ERROR', phase_id, worker_id, error_msg))
                            return

                    except Exception as e:
                        # 记录失败结果
                        error_msg = f"工作线程 {worker_id}: 阶段 {phase_id} 执行失败: {e}"
                        logger.error(error_msg)

                        self._record_phase_result(
                            phase_id, worker_id, PhaseStatus.FAILED,
                            time.time(), time.time(), {}, str(e)
                        )

                        # 关键阶段失败,停止执行
                        if phase.get('critical', False):
                            logger.error(f"关键阶段 {phase_id} 失败,工作线程 {worker_id} 停止执行")
                            self.error_queue.put(('CRITICAL_PHASE_FAILED', phase_id, worker_id, error_msg))
                            return
                        else:
                            logger.warning(f"非关键阶段 {phase_id} 失败,继续执行后续阶段")
                            continue

                logger.info(f"工作线程 {worker_id}: 所有阶段执行完成")

            def _check_dependencies(self, dependencies: List[int], worker_id: int) -> bool:
                """检查阶段依赖关系"""
                for dep_id in dependencies:
                    # 检查依赖阶段是否有成功的结果
                    dep_completed = any(
                        result.phase_id == dep_id and
                        result.status == PhaseStatus.COMPLETED and
                        result.worker_id == worker_id
                        for result in self.phase_results
                    )

                    if not dep_completed:
                        logger.debug(f"工作线程 {worker_id}: 依赖阶段 {dep_id} 未完成")
                        return False

                return True

            def _execute_complex_phase(self, phase_id: int, worker_id: int) -> Dict[str, Any]:
                """执行复杂阶段任务"""
                if phase_id == 1:
                    # 阶段1:初始化
                    return self._execute_initialization(worker_id)
                elif phase_id == 2:
                    # 阶段2:数据加载
                    return self._execute_data_loading(worker_id)
                elif phase_id == 3:
                    # 阶段3:预处理
                    return self._execute_preprocessing(worker_id)
                elif phase_id == 4:
                    # 阶段4:核心处理
                    return self._execute_core_processing(worker_id)
                elif phase_id == 5:
                    # 阶段5:后处理
                    return self._execute_postprocessing(worker_id)
                elif phase_id == 6:
                    # 阶段6:结果输出
                    return self._execute_result_output(worker_id)
                else:
                    raise ValueError(f"未知的阶段ID: {phase_id}")

            def _execute_initialization(self, worker_id: int) -> Dict[str, Any]:
                """执行初始化阶段"""
                init_time = random.uniform(0.1, 0.5)
                time.sleep(init_time)

                # 初始化配置
                config = {
                    'worker_id': worker_id,
                    'buffer_size': random.randint(1024, 4096),
                    'timeout': random.uniform(5.0, 30.0),
                    'retry_count': random.randint(1, 5)
                }

                # 存储配置到共享数据
                with self.shared_lock:
                    self.shared_data[f'config_{worker_id}'] = config

                logger.debug(f"工作线程 {worker_id}: 初始化完成,配置: {config}")

                return {
                    'config': config,
                    'init_time': init_time,
                    'status': 'initialized'
                }

            def _execute_data_loading(self, worker_id: int) -> Dict[str, Any]:
                """执行数据加载阶段"""
                load_time = random.uniform(1.0, 3.0)
                time.sleep(load_time)

                # 模拟数据加载
                data_batches = random.randint(5, 15)
                total_records = random.randint(1000, 5000)
                data_size = random.randint(10000, 100000)

                # 模拟加载数据
                loaded_data = {
                    'worker_id': worker_id,
                    'batches': data_batches,
                    'records': total_records,
                    'size_bytes': data_size,
                    'load_time': load_time,
                    'data_quality': random.uniform(0.85, 0.99)
                }

                # 存储到共享数据
                with self.shared_lock:
                    self.shared_data[f'loaded_data_{worker_id}'] = loaded_data

                logger.debug(f"工作线程 {worker_id}: 数据加载完成,"
                           f"记录数: {total_records}, 大小: {data_size}字节")

                return loaded_data

            def _execute_preprocessing(self, worker_id: int) -> Dict[str, Any]:
                """执行预处理阶段"""
                proc_time = random.uniform(0.5, 2.0)
                time.sleep(proc_time)

                # 获取加载的数据
                with self.shared_lock:
                    loaded_data = self.shared_data.get(f'loaded_data_{worker_id}', {})

                # 模拟预处理
                original_records = loaded_data.get('records', 1000)
                cleaned_records = int(original_records * random.uniform(0.95, 0.99))
                filtered_records = int(cleaned_records * random.uniform(0.80, 0.95))

                preprocessing_result = {
                    'worker_id': worker_id,
                    'original_records': original_records,
                    'cleaned_records': cleaned_records,
                    'filtered_records': filtered_records,
                    'preprocessing_time': proc_time,
                    'data_loss_rate': (original_records - filtered_records) / original_records * 100
                }

                logger.debug(f"工作线程 {worker_id}: 预处理完成,"
                           f"过滤后记录数: {filtered_records}")

                return preprocessing_result

            def _execute_core_processing(self, worker_id: int) -> Dict[str, Any]:
                """执行核心处理阶段"""
                proc_time = random.uniform(2.0, 5.0)
                time.sleep(proc_time)

                # 获取预处理结果
                with self.shared_lock:
                    preprocessing = self.shared_data.get(f'preprocessing_{worker_id}', {})
                    loaded_data = self.shared_data.get(f'loaded_data_{worker_id}', {})

                # 模拟核心处理
                input_records = preprocessing.get('filtered_records', 800)
                processed_records = int(input_records * random.uniform(0.90, 0.98))
                success_count = int(processed_records * random.uniform(0.95, 0.99))

                core_result = {
                    'worker_id': worker_id,
                    'input_records': input_records,
                    'processed_records': processed_records,
                    'success_count': success_count,
                    'processing_time': proc_time,
                    'success_rate': (success_count / processed_records * 100) if processed_records > 0 else 0,
                    'throughput': processed_records / proc_time if proc_time > 0 else 0
                }

                logger.debug(f"工作线程 {worker_id}: 核心处理完成,"
                           f"成功率: {core_result['success_rate']:.1f}%")

                return core_result

            def _execute_postprocessing(self, worker_id: int) -> Dict[str, Any]:
                """执行后处理阶段"""
                post_time = random.uniform(0.3, 1.5)
                time.sleep(post_time)

                # 获取核心处理结果
                with self.shared_lock:
                    core_result = self.shared_data.get(f'core_processing_{worker_id}', {})

                # 模拟后处理
                success_count = core_result.get('success_count', 750)
                validated_count = int(success_count * random.uniform(0.98, 1.0))
                formatted_count = validated_count

                post_result = {
                    'worker_id': worker_id,
                    'input_count': success_count,
                    'validated_count': validated_count,
                    'formatted_count': formatted_count,
                    'postprocessing_time': post_time,
                    'validation_rate': (validated_count / success_count * 100) if success_count > 0 else 0
                }

                logger.debug(f"工作线程 {worker_id}: 后处理完成,"
                           f"验证率: {post_result['validation_rate']:.1f}%")

                return post_result

            def _execute_result_output(self, worker_id: int) -> Dict[str, Any]:
                """执行结果输出阶段"""
                output_time = random.uniform(0.2, 1.0)
                time.sleep(output_time)

                # 收集所有阶段的结果
                worker_results = {}
                for result in self.phase_results:
                    if result.worker_id == worker_id and result.status == PhaseStatus.COMPLETED:
                        worker_results[result.phase_id] = result.result_data

                # 模拟结果输出
                total_success = sum(
                    result.get('success_count', 0)
                    for result in worker_results.values()
                    if 'success_count' in result
                )

                output_result = {
                    'worker_id': worker_id,
                    'total_phases': len(worker_results),
                    'total_success_count': total_success,
                    'output_time': output_time,
                    'output_format': 'json',
                    'output_size': random.randint(1024, 10240)
                }

                logger.debug(f"工作线程 {worker_id}: 结果输出完成,"
                           f"总成功数: {total_success}")

                return output_result

            def _record_phase_result(self, phase_id: int, worker_id: int,
                                  status: PhaseStatus, start_time: float,
                                  end_time: float, result_data: Dict[str, Any],
                                  error_message: Optional[str] = None):
                """记录阶段执行结果"""
                result = PhaseResult(
                    phase_id=phase_id,
                    worker_id=worker_id,
                    status=status,
                    start_time=start_time,
                    end_time=end_time,
                    result_data=result_data,
                    error_message=error_message
                )

                with self.result_lock:
                    self.phase_results.append(result)

                # 存储到共享数据
                with self.shared_lock:
                    phase_name = self._get_phase_name(phase_id)
                    self.shared_data[f'{phase_name}_{worker_id}'] = result_data

            def _get_phase_name(self, phase_id: int) -> str:
                """获取阶段名称"""
                phase_map = {
                    1: 'initialization',
                    2: 'data_loading',
                    3: 'preprocessing',
                    4: 'core_processing',
                    5: 'postprocessing',
                    6: 'result_output'
                }
                return phase_map.get(phase_id, f'phase_{phase_id}')

            def error_monitor(self):
                """错误监控线程"""
                logger.info("错误监控线程启动")

                error_count = 0
                critical_errors = []

                while True:
                    try:
                        # 检查是否有错误
                        if not self.error_queue.empty():
                            error_type, phase_id, worker_id, error_msg = self.error_queue.get(timeout=1)
                            error_count += 1

                            logger.warning(f"检测到错误 [类型: {error_type}, "
                                        f"阶段: {phase_id}, 工作线程: {worker_id}]: {error_msg}")

                            if error_type == 'CRITICAL_PHASE_FAILED':
                                critical_errors.append((phase_id, worker_id, error_msg))

                        # 检查是否所有工作线程都完成
                        completed_workers = sum(
                            1 for result in self.phase_results
                            if result.phase_id == len(self.phases) and
                            result.status in [PhaseStatus.COMPLETED, PhaseStatus.FAILED]
                        )

                        if completed_workers >= self.num_workers:
                            break

                        time.sleep(0.1)

                    except:
                        # 超时或其他异常,继续检查
                        continue

                logger.info(f"错误监控结束,总错误数: {error_count}, 关键错误数: {len(critical_errors)}")

                if critical_errors:
                    logger.error("关键错误详情:")
                    for phase_id, worker_id, error_msg in critical_errors:
                        logger.error(f"  阶段{phase_id}, 工作线程{worker_id}: {error_msg}")

            def run_complex_phased_demo(self):
                """运行复杂分阶段执行演示"""
                logger.info("=== 复杂分阶段执行模式演示 ===")
                logger.info(f"工作线程数: {self.num_workers}")
                logger.info(f"阶段配置: {len(self.phases)} 个阶段")

                for phase in self.phases:
                    deps_str = ', '.join(map(str, phase['dependencies'])) if phase['dependencies'] else '无'
                    critical_str = '关键' if phase['critical'] else '非关键'
                    logger.info(f"  阶段{phase['id']}: {phase['name']} (依赖: {deps_str}, {critical_str})")

                threads = []
                demo_start = time.time()

                # 启动错误监控线程
                monitor_thread = threading.Thread(target=self.error_monitor, daemon=True)
                monitor_thread.start()

                # 创建并启动工作线程
                for i in range(self.num_workers):
                    thread = threading.Thread(
                        target=self.complex_phase_worker,
                        args=(i + 1,),
                        name=f"ComplexWorker-{i + 1}"
                    )
                    threads.append(thread)
                    thread.start()
                    time.sleep(0.05)

                # 等待所有线程完成
                for thread in threads:
                    thread.join(timeout=60)
                    if thread.is_alive():
                        logger.warning(f"线程 {thread.name} 未能在超时内完成")

                monitor_thread.join(timeout=5)

                demo_time = time.time() - demo_start
                self._analyze_complex_results(demo_time)

            def _analyze_complex_results(self, total_demo_time: float):
                """分析复杂分阶段执行结果"""
                logger.info("\n=== 复杂分阶段执行结果分析 ===")
                logger.info(f"演示总耗时: {total_demo_time:.2f}秒")

                # 按阶段分析结果
                for phase in self.phases:
                    phase_id = phase['id']
                    phase_name = phase['name']

                    phase_results = [
                        result for result in self.phase_results
                        if result.phase_id == phase_id
                    ]

                    if not phase_results:
                        logger.info(f"\n阶段 {phase_id}({phase_name}): 无执行结果")
                        continue

                    # 统计各种状态
                    status_counts = {}
                    total_duration = 0

                    for result in phase_results:
                        status = result.status.value
                        status_counts[status] = status_counts.get(status, 0) + 1
                        total_duration += (result.end_time - result.start_time)

                    avg_duration = total_duration / len(phase_results) if phase_results else 0

                    logger.info(f"\n阶段 {phase_id}({phase_name}):")
                    logger.info(f"  执行情况: {dict(status_counts)}")
                    logger.info(f"  平均执行时间: {avg_duration:.2f}秒")
                    logger.info(f"  完成率: {status_counts.get('已完成', 0)}/{len(phase_results)} "
                              f"({status_counts.get('已完成', 0)/len(phase_results)*100:.1f}%)")

                    # 分析具体的阶段结果数据
                    if phase_id == 2:  # 数据加载阶段
                        total_records = sum(
                            result.result_data.get('records', 0)
                            for result in phase_results
                            if result.status == PhaseStatus.COMPLETED
                        )
                        total_size = sum(
                            result.result_data.get('size_bytes', 0)
                            for result in phase_results
                            if result.status == PhaseStatus.COMPLETED
                        )
                        logger.info(f"  总加载数据: {total_records} 条记录, {total_size} 字节")

                    elif phase_id == 4:  # 核心处理阶段
                        total_processed = sum(
                            result.result_data.get('processed_records', 0)
                            for result in phase_results
                            if result.status == PhaseStatus.COMPLETED
                        )
                        total_success = sum(
                            result.result_data.get('success_count', 0)
                            for result in phase_results
                            if result.status == PhaseStatus.COMPLETED
                        )
                        avg_success_rate = sum(
                            result.result_data.get('success_rate', 0)
                            for result in phase_results
                            if result.status == PhaseStatus.COMPLETED
                        ) / len([r for r in phase_results if r.status == PhaseStatus.COMPLETED]) if phase_results else 0

                        logger.info(f"  总处理: {total_processed} 条记录")
                        logger.info(f"  总成功: {total_success} 条记录")
                        logger.info(f"  平均成功率: {avg_success_rate:.1f}%")

                # 整体执行效率分析
                all_completed = all(
                    result.status == PhaseStatus.COMPLETED
                    for result in self.phase_results
                    if result.phase_id == len(self.phases)
                )

                if all_completed:
                    logger.info("\n✅ 所有关键阶段都成功完成")
                else:
                    failed_workers = [
                        result.worker_id
                        for result in self.phase_results
                        if result.phase_id == len(self.phases) and
                           result.status != PhaseStatus.COMPLETED
                    ]
                    logger.info(f"\n⚠️ 以下工作线程未完成所有阶段: {failed_workers}")

        # 使用示例
        if __name__ == "__main__":
            complex_executor = ComplexPhasedExecution(num_workers=6)
            complex_executor.run_complex_phased_demo()
        ---

03.性能优化策略
    a.阶段时间分析
        a.瓶颈识别
            监控各阶段的执行时间,识别性能瓶颈和优化空间。
        b.负载均衡
            根据阶段特征调整工作线程分配,避免资源浪费。
        c.并行度优化
            在保证正确性的前提下提高并行执行效率。
    b.资源优化
        a.内存管理
            避免阶段间的内存泄漏,优化数据传递效率。
        b.I/O优化
            合并小I/O操作,减少阶段切换开销。
        c.缓存策略
            对阶段间的中间结果进行缓存,减少重复计算。
    c.代码示例
        ---
        # 分阶段执行性能优化示例
        import threading
        import time
        import logging
        import random
        import gc
        import psutil
        import os
        from typing import List, Dict, Any, Optional
        from dataclasses import dataclass
        from concurrent.futures import ThreadPoolExecutor
        from contextlib import contextmanager
        import weakref

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        @dataclass
        class PhaseMetrics:
            """阶段性能指标"""
            phase_id: int
            worker_id: int
            start_time: float
            end_time: float
            cpu_usage: float
            memory_usage: int
            execution_time: float
            waiting_time: float
            throughput: float
            efficiency: float

        class OptimizedPhasedExecution:
            """优化的分阶段执行演示"""
            def __init__(self, num_workers: int = 8, phases: int = 4):
                self.num_workers = num_workers
                self.num_phases = phases
                self.process_id = os.getpid()

                # 性能指标存储
                self.phase_metrics: List[PhaseMetrics] = []
                self.metrics_lock = threading.Lock()

                # 优化配置
                self.optimization_config = {
                    'enable_memory_pool': True,
                    'enable_io_batching': True,
                    'enable_result_caching': True,
                    'gc_threshold': 100,  # 每100次操作触发一次GC
                    'batch_size': 32,
                    'memory_limit_mb': 512
                }

                # 资源池
                self.memory_pool = {}
                self.pool_lock = threading.Lock()
                self.result_cache = {}
                self.cache_lock = threading.Lock()
                self.io_batch_queue = []
                self.io_lock = threading.Lock()

                # 同步点
                self.barriers = [threading.Barrier(parties=num_workers)
                               for _ in range(phases)]

                # 统计计数器
                self.operation_counter = 0

            def optimized_worker(self, worker_id: int):
                """优化的分阶段工作线程"""
                logger.info(f"优化工作线程 {worker_id}: 启动")

                # 预分配资源
                self._preallocate_resources(worker_id)

                try:
                    for phase in range(self.num_phases):
                        phase_metrics = self._execute_optimized_phase(worker_id, phase)

                        if phase_metrics:
                            with self.metrics_lock:
                                self.phase_metrics.append(phase_metrics)

                        # 同步点等待(带超时和重试)
                        self._optimized_barrier_wait(phase, worker_id)

                        # 定期垃圾回收
                        if self.operation_counter % self.optimization_config['gc_threshold'] == 0:
                            self._optimized_garbage_collection()

                finally:
                    # 清理资源
                    self._cleanup_resources(worker_id)

                logger.info(f"优化工作线程 {worker_id}: 完成")

            def _execute_optimized_phase(self, worker_id: int, phase: int) -> Optional[PhaseMetrics]:
                """执行优化的阶段任务"""
                phase_start = time.time()

                # 记录开始时的资源状态
                process = psutil.Process(self.process_id)
                start_memory = process.memory_info().rss
                start_cpu = process.cpu_percent()

                try:
                    # 执行阶段任务
                    if phase == 0:
                        result = self._optimized_phase1_initialization(worker_id)
                    elif phase == 1:
                        result = self._optimized_phase2_data_processing(worker_id)
                    elif phase == 2:
                        result = self._optimized_phase3_computation(worker_id)
                    else:
                        result = self._optimized_phase4_aggregation(worker_id)

                    # 更新操作计数器
                    self.operation_counter += 1

                    phase_end = time.time()

                    # 记录结束时的资源状态
                    end_memory = process.memory_info().rss
                    end_cpu = process.cpu_percent()

                    execution_time = phase_end - phase_start
                    memory_usage = end_memory - start_memory
                    avg_cpu = (start_cpu + end_cpu) / 2

                    # 计算性能指标
                    throughput = result.get('processed_items', 0) / execution_time if execution_time > 0 else 0
                    efficiency = (result.get('success_items', 0) / result.get('processed_items', 1)) * 100

                    metrics = PhaseMetrics(
                        phase_id=phase + 1,
                        worker_id=worker_id,
                        start_time=phase_start,
                        end_time=phase_end,
                        cpu_usage=avg_cpu,
                        memory_usage=memory_usage,
                        execution_time=execution_time,
                        waiting_time=0,  # 将在同步点计算
                        throughput=throughput,
                        efficiency=efficiency
                    )

                    logger.debug(f"工作线程 {worker_id}: 阶段 {phase + 1} 完成,"
                               f"执行时间: {execution_time:.3f}s, 吞吐量: {throughput:.1f}/s")

                    return metrics

                except Exception as e:
                    logger.error(f"工作线程 {worker_id}: 阶段 {phase + 1} 执行失败: {e}")
                    return None

            def _optimized_barrier_wait(self, phase: int, worker_id: int):
                """优化的屏障等待"""
                wait_start = time.time()

                try:
                    # 使用较小的超时时间进行重试
                    max_retries = 3
                    timeout = 5.0

                    for attempt in range(max_retries):
                        try:
                            sync_result = self.barriers[phase].wait(timeout=timeout)
                            wait_time = time.time() - wait_start

                            # 更新等待时间指标
                            with self.metrics_lock:
                                for metric in self.phase_metrics:
                                    if metric.phase_id == phase + 1 and metric.worker_id == worker_id:
                                        metric.waiting_time = wait_time
                                        break

                            logger.debug(f"工作线程 {worker_id}: 阶段 {phase + 1} 同步完成,"
                                       f"等待时间: {wait_time:.3f}s")
                            return

                        except threading.BrokenBarrierError:
                            logger.warning(f"工作线程 {worker_id}: 阶段 {phase + 1} "
                                         f"同步点损坏,尝试 {attempt + 1}/{max_retries}")
                            if attempt == max_retries - 1:
                                raise
                            time.sleep(0.1)  # 短暂等待后重试

                except Exception as e:
                    logger.error(f"工作线程 {worker_id}: 阶段 {phase + 1} 同步失败: {e}")
                    raise

            def _preallocate_resources(self, worker_id: int):
                """预分配资源"""
                if self.optimization_config['enable_memory_pool']:
                    with self.pool_lock:
                        # 为工作线程预分配内存池
                        self.memory_pool[worker_id] = {
                            'buffers': [bytearray(1024) for _ in range(10)],
                            'temp_storage': {},
                            'last_used': time.time()
                        }

            def _cleanup_resources(self, worker_id: int):
                """清理资源"""
                if self.optimization_config['enable_memory_pool']:
                    with self.pool_lock:
                        if worker_id in self.memory_pool:
                            del self.memory_pool[worker_id]

            @contextmanager
            def _get_memory_buffer(self, worker_id: int, size: int = 1024):
                """获取内存缓冲区(上下文管理器)"""
                if not self.optimization_config['enable_memory_pool']:
                    yield bytearray(size)
                    return

                with self.pool_lock:
                    pool = self.memory_pool.get(worker_id, {})
                    buffers = pool.get('buffers', [])

                    if buffers:
                        buffer = buffers.pop()
                        pool['last_used'] = time.time()
                    else:
                        buffer = bytearray(size)

                try:
                    yield buffer
                finally:
                    with self.pool_lock:
                        if worker_id in self.memory_pool:
                            if len(buffer) == size:  # 只返回正确大小的缓冲区
                                self.memory_pool[worker_id]['buffers'].append(buffer)

            def _optimized_phase1_initialization(self, worker_id: int) -> Dict[str, Any]:
                """优化的初始化阶段"""
                start_time = time.time()

                # 使用内存池进行初始化
                with self._get_memory_buffer(worker_id, 2048) as buffer:
                    # 模拟初始化工作
                    config_size = random.randint(100, 500)
                    buffer[:config_size] = b'\x01' * config_size

                    init_data = {
                        'worker_id': worker_id,
                        'config_size': config_size,
                        'buffer_size': len(buffer),
                        'init_mode': 'optimized'
                    }

                # 缓存初始化结果
                if self.optimization_config['enable_result_caching']:
                    with self.cache_lock:
                        cache_key = f'init_{worker_id}'
                        self.result_cache[cache_key] = init_data

                execution_time = time.time() - start_time
                logger.debug(f"工作线程 {worker_id}: 初始化完成,耗时: {execution_time:.3f}s")

                return {
                    'processed_items': config_size,
                    'success_items': config_size,
                    'execution_time': execution_time,
                    'init_data': init_data
                }

            def _optimized_phase2_data_processing(self, worker_id: int) -> Dict[str, Any]:
                """优化的数据处理阶段"""
                start_time = time.time()

                # 获取缓存中的初始化数据
                init_data = None
                if self.optimization_config['enable_result_caching']:
                    with self.cache_lock:
                        cache_key = f'init_{worker_id}'
                        init_data = self.result_cache.get(cache_key)

                config_size = init_data.get('config_size', 300) if init_data else 300

                # 批量I/O处理
                batch_size = self.optimization_config['batch_size']
                total_items = config_size * 10
                processed_items = 0
                success_items = 0

                batches = (total_items + batch_size - 1) // batch_size

                for batch_idx in range(batches):
                    batch_start = batch_idx * batch_size
                    batch_end = min(batch_start + batch_size, total_items)
                    batch_items = batch_end - batch_start

                    # 模拟批量数据处理
                    with self._get_memory_buffer(worker_id, batch_items) as buffer:
                        # 填充数据
                        for i in range(batch_items):
                            buffer[i] = random.randint(0, 255)

                        # 模拟处理时间
                        processing_time = random.uniform(0.01, 0.05)
                        time.sleep(processing_time)

                        # 模拟处理结果(成功率95%)
                        batch_success = int(batch_items * 0.95)
                        success_items += batch_success
                        processed_items += batch_items

                    # 定期刷新I/O队列
                    if self.optimization_config['enable_io_batching'] and batch_idx % 5 == 0:
                        self._flush_io_batch()

                execution_time = time.time() - start_time

                logger.debug(f"工作线程 {worker_id}: 数据处理完成,"
                           f"处理: {processed_items}, 成功: {success_items}, 耗时: {execution_time:.3f}s")

                return {
                    'processed_items': processed_items,
                    'success_items': success_items,
                    'execution_time': execution_time,
                    'batch_count': batches,
                    'avg_batch_size': processed_items / batches
                }

            def _optimized_phase3_computation(self, worker_id: int) -> Dict[str, Any]:
                """优化的计算阶段"""
                start_time = time.time()

                # CPU密集型计算优化
                computation_units = random.randint(1000, 5000)
                processed_units = 0
                success_units = 0

                # 分块计算以减少内存压力
                chunk_size = 500
                chunks = (computation_units + chunk_size - 1) // chunk_size

                for chunk_idx in range(chunks):
                    chunk_start = chunk_idx * chunk_size
                    chunk_end = min(chunk_start + chunk_size, computation_units)
                    chunk_units = chunk_end - chunk_start

                    # 执行计算任务
                    chunk_success = 0
                    for i in range(chunk_units):
                        # 模拟计算操作
                        result = (i * i + random.randint(1, 100)) % 1000
                        if result < 950:  # 95%成功率
                            chunk_success += 1

                    processed_units += chunk_units
                    success_units += chunk_success

                    # 每个chunk后短暂休息,避免CPU过载
                    time.sleep(0.001)

                execution_time = time.time() - start_time
                throughput = processed_units / execution_time if execution_time > 0 else 0

                logger.debug(f"工作线程 {worker_id}: 计算完成,"
                           f"计算: {processed_units}, 成功: {success_units}, "
                           f"吞吐量: {throughput:.1f} ops/s")

                return {
                    'processed_items': processed_units,
                    'success_items': success_units,
                    'execution_time': execution_time,
                    'throughput': throughput,
                    'chunk_count': chunks
                }

            def _optimized_phase4_aggregation(self, worker_id: int) -> Dict[str, Any]:
                """优化的聚合阶段"""
                start_time = time.time()

                # 获取之前阶段的结果进行聚合
                aggregation_data = {
                    'worker_id': worker_id,
                    'phase_results': {},
                    'aggregated_metrics': {}
                }

                # 收集各阶段指标
                with self.metrics_lock:
                    worker_metrics = [
                        metric for metric in self.phase_metrics
                        if metric.worker_id == worker_id
                    ]

                total_execution_time = sum(metric.execution_time for metric in worker_metrics)
                total_waiting_time = sum(metric.waiting_time for metric in worker_metrics)
                avg_throughput = sum(metric.throughput for metric in worker_metrics) / len(worker_metrics) if worker_metrics else 0
                avg_efficiency = sum(metric.efficiency for metric in worker_metrics) / len(worker_metrics) if worker_metrics else 0

                # 计算聚合指标
                total_time = total_execution_time + total_waiting_time
                efficiency_ratio = total_execution_time / total_time if total_time > 0 else 0
                parallel_efficiency = avg_efficiency * efficiency_ratio

                aggregation_data['aggregated_metrics'] = {
                    'total_execution_time': total_execution_time,
                    'total_waiting_time': total_waiting_time,
                    'total_time': total_time,
                    'avg_throughput': avg_throughput,
                    'avg_efficiency': avg_efficiency,
                    'efficiency_ratio': efficiency_ratio,
                    'parallel_efficiency': parallel_efficiency
                }

                # 执行最终的聚合操作
                time.sleep(random.uniform(0.1, 0.5))
                aggregation_time = time.time() - start_time

                logger.debug(f"工作线程 {worker_id}: 聚合完成,"
                           f"并行效率: {parallel_efficiency:.1f}%, "
                           f"聚合时间: {aggregation_time:.3f}s")

                return {
                    'processed_items': 1,  # 聚合操作计为1个处理项
                    'success_items': 1,
                    'execution_time': aggregation_time,
                    'parallel_efficiency': parallel_efficiency,
                    'aggregation_data': aggregation_data
                }

            def _flush_io_batch(self):
                """刷新I/O批处理队列"""
                with self.io_lock:
                    if self.io_batch_queue:
                        # 模拟批量I/O操作
                        batch_size = len(self.io_batch_queue)
                        time.sleep(0.01)  # 模拟I/O延迟
                        self.io_batch_queue.clear()

                        logger.debug(f"刷新I/O批次,处理 {batch_size} 个操作")

            def _optimized_garbage_collection(self):
                """优化的垃圾回收"""
                before_memory = psutil.Process(self.process_id).memory_info().rss

                # 执行垃圾回收
                collected = gc.collect()

                after_memory = psutil.Process(self.process_id).memory_info().rss
                memory_freed = before_memory - after_memory

                logger.debug(f"垃圾回收完成: 回收对象 {collected} 个, "
                           f"释放内存 {memory_freed / 1024 / 1024:.1f} MB")

            def performance_monitor(self):
                """性能监控线程"""
                logger.info("性能监控线程启动")

                process = psutil.Process(self.process_id)
                start_time = time.time()

                monitoring_data = {
                    'cpu_usage': [],
                    'memory_usage': [],
                    'timestamps': []
                }

                while True:
                    try:
                        # 采集系统指标
                        cpu_percent = process.cpu_percent()
                        memory_info = process.memory_info()
                        memory_mb = memory_info.rss / 1024 / 1024
                        current_time = time.time() - start_time

                        monitoring_data['cpu_usage'].append(cpu_percent)
                        monitoring_data['memory_usage'].append(memory_mb)
                        monitoring_data['timestamps'].append(current_time)

                        # 检查内存限制
                        if memory_mb > self.optimization_config['memory_limit_mb']:
                            logger.warning(f"内存使用超限: {memory_mb:.1f}MB > "
                                         f"{self.optimization_config['memory_limit_mb']}MB")
                            self._optimized_garbage_collection()

                        # 检查是否完成
                        with self.metrics_lock:
                            if len(self.phase_metrics) >= self.num_workers * self.num_phases:
                                break

                        time.sleep(0.5)

                    except Exception as e:
                        logger.error(f"性能监控异常: {e}")
                        break

                # 输出监控总结
                self._analyze_monitoring_data(monitoring_data)
                logger.info("性能监控线程结束")

            def _analyze_monitoring_data(self, monitoring_data: Dict[str, List]):
                """分析监控数据"""
                if not monitoring_data['cpu_usage']:
                    return

                cpu_avg = sum(monitoring_data['cpu_usage']) / len(monitoring_data['cpu_usage'])
                cpu_max = max(monitoring_data['cpu_usage'])
                memory_avg = sum(monitoring_data['memory_usage']) / len(monitoring_data['memory_usage'])
                memory_max = max(monitoring_data['memory_usage'])
                memory_min = min(monitoring_data['memory_usage'])

                logger.info("\n=== 性能监控总结 ===")
                logger.info(f"CPU使用率: 平均 {cpu_avg:.1f}%, 最高 {cpu_max:.1f}%")
                logger.info(f"内存使用: 平均 {memory_avg:.1f}MB, "
                          f"最高 {memory_max:.1f}MB, 最低 {memory_min:.1f}MB")
                logger.info(f"监控时长: {monitoring_data['timestamps'][-1]:.2f}秒")

            def run_optimized_demo(self):
                """运行优化演示"""
                logger.info("=== 优化分阶段执行演示 ===")
                logger.info(f"工作线程数: {self.num_workers}")
                logger.info(f"阶段数: {self.num_phases}")
                logger.info(f"优化配置: {self.optimization_config}")

                threads = []
                demo_start = time.time()

                # 启动性能监控线程
                monitor_thread = threading.Thread(target=self.performance_monitor, daemon=True)
                monitor_thread.start()

                # 创建并启动优化的工作线程
                for i in range(self.num_workers):
                    thread = threading.Thread(
                        target=self.optimized_worker,
                        args=(i + 1,),
                        name=f"OptimizedWorker-{i + 1}"
                    )
                    threads.append(thread)
                    thread.start()
                    time.sleep(0.02)

                # 等待所有线程完成
                for thread in threads:
                    thread.join(timeout=45)
                    if thread.is_alive():
                        logger.warning(f"线程 {thread.name} 未能在超时内完成")

                monitor_thread.join(timeout=5)

                demo_time = time.time() - demo_start
                self._analyze_optimization_results(demo_time)

            def _analyze_optimization_results(self, total_demo_time: float):
                """分析优化结果"""
                logger.info("\n=== 优化结果分析 ===")
                logger.info(f"演示总耗时: {total_demo_time:.2f}秒")
                logger.info(f"总操作数: {self.operation_counter}")

                # 按阶段分析性能
                for phase in range(1, self.num_phases + 1):
                    phase_metrics = [
                        metric for metric in self.phase_metrics
                        if metric.phase_id == phase
                    ]

                    if not phase_metrics:
                        continue

                    execution_times = [metric.execution_time for metric in phase_metrics]
                    waiting_times = [metric.waiting_time for metric in phase_metrics]
                    throughputs = [metric.throughput for metric in phase_metrics]
                    efficiencies = [metric.efficiency for metric in phase_metrics]

                    avg_exec = sum(execution_times) / len(execution_times)
                    avg_wait = sum(waiting_times) / len(waiting_times)
                    avg_throughput = sum(throughputs) / len(throughputs)
                    avg_efficiency = sum(efficiencies) / len(efficiencies)

                    total_time = avg_exec + avg_wait
                    time_efficiency = avg_exec / total_time if total_time > 0 else 0

                    logger.info(f"\n阶段 {phase}:")
                    logger.info(f"  平均执行时间: {avg_exec:.3f}s")
                    logger.info(f"  平均等待时间: {avg_wait:.3f}s")
                    logger.info(f"  时间效率: {time_efficiency * 100:.1f}%")
                    logger.info(f"  平均吞吐量: {avg_throughput:.1f} ops/s")
                    logger.info(f"  平均效率: {avg_efficiency:.1f}%")

                # 整体性能评估
                if self.phase_metrics:
                    total_processed = sum(metric.throughput * metric.execution_time
                                        for metric in self.phase_metrics)
                    avg_cpu = sum(metric.cpu_usage for metric in self.phase_metrics) / len(self.phase_metrics)
                    total_memory = sum(metric.memory_usage for metric in self.phase_metrics)

                    logger.info(f"\n=== 整体性能评估 ===")
                    logger.info(f"总处理量: {total_processed:.0f} 个项目")
                    logger.info(f"平均CPU使用率: {avg_cpu:.1f}%")
                    logger.info(f"总内存使用: {total_memory / 1024 / 1024:.1f}MB")
                    logger.info(f"平均吞吐量: {total_processed / total_demo_time:.1f} ops/s")

        # 使用示例
        if __name__ == "__main__":
            optimized_executor = OptimizedPhasedExecution(num_workers=8, phases=4)
            optimized_executor.run_optimized_demo()
        ---

04.实际应用场景
    a.数据处理管道
        a.ETL流程
            数据提取、转换、加载的分阶段执行,确保数据处理的完整性。
        b.批处理作业
            大批量数据的分批处理,避免内存溢出和系统过载。
        c.实时数据流
            流式数据的阶段性处理,支持数据的实时分析和响应。
    b.机器学习流程
        a.模型训练阶段
            数据预处理、特征工程、模型训练、结果评估的有序执行。
        b.超参数调优
            参数空间搜索、模型训练、性能评估的循环分阶段执行。
        c.模型部署
            模型验证、打包、部署、监控的分阶段发布流程。
    c.代码示例
        ---
        # 分阶段执行实际应用示例
        import threading
        import time
        import logging
        import random
        import json
        import csv
        import os
        from typing import List, Dict, Any, Optional, Callable
        from dataclasses import dataclass, asdict
        from enum import Enum
        from pathlib import Path

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class PipelineStage(Enum):
            """管道阶段"""
            DATA_INGESTION = "数据接入"
            DATA_VALIDATION = "数据验证"
            DATA_CLEANING = "数据清洗"
            FEATURE_ENGINEERING = "特征工程"
            MODEL_TRAINING = "模型训练"
            MODEL_EVALUATION = "模型评估"
            MODEL_DEPLOYMENT = "模型部署"

        @dataclass
        class DataRecord:
            """数据记录"""
            id: int
            timestamp: float
            feature1: float
            feature2: float
            feature3: float
            label: Optional[int] = None
            prediction: Optional[float] = None
            validation_status: Optional[str] = None
            cleaning_status: Optional[str] = None

        @dataclass
        class PipelineMetrics:
            """管道指标"""
            stage: PipelineStage
            worker_id: int
            start_time: float
            end_time: float
            input_count: int
            output_count: int
            success_count: int
            error_count: int
            processing_rate: float
            quality_score: float

        class MLOpsPipeline:
            """机器学习运维管道示例"""
            def __init__(self, num_workers: int = 4):
                self.num_workers = num_workers
                self.stages = list(PipelineStage)

                # 数据存储
                self.stage_data: Dict[PipelineStage, List[DataRecord]] = {}
                self.data_locks = {stage: threading.Lock() for stage in self.stages}

                # 同步点
                self.barriers = {
                    stage: threading.Barrier(parties=num_workers)
                    for stage in self.stages
                }

                # 性能指标
                self.pipeline_metrics: List[PipelineMetrics] = []
                self.metrics_lock = threading.Lock()

                # 管道配置
                self.pipeline_config = {
                    'data_quality_threshold': 0.95,
                    'max_validation_errors': 10,
                    'feature_engineering_enabled': True,
                    'model_type': 'random_forest',
                    'training_epochs': 100,
                    'evaluation_split': 0.2,
                    'deployment_strategy': 'blue_green'
                }

            def ml_pipeline_worker(self, worker_id: int):
                """机器学习管道工作线程"""
                logger.info(f"ML管道工作线程 {worker_id}: 启动")

                try:
                    for stage in self.stages:
                        stage_start = time.time()
                        logger.info(f"工作线程 {worker_id}: 开始阶段 {stage.value}")

                        # 执行管道阶段
                        metrics = self._execute_pipeline_stage(stage, worker_id)

                        if metrics:
                            with self.metrics_lock:
                                self.pipeline_metrics.append(metrics)

                        stage_duration = time.time() - stage_start

                        # 检查阶段是否成功
                        if metrics and metrics.error_count > self.pipeline_config['max_validation_errors']:
                            logger.error(f"工作线程 {worker_id}: 阶段 {stage.value} 错误过多,停止执行")
                            return

                        # 同步点等待
                        sync_start = time.time()
                        try:
                            sync_result = self.barriers[stage].wait(timeout=30.0)
                            sync_duration = time.time() - sync_start

                            logger.debug(f"工作线程 {worker_id}: 阶段 {stage.value} 同步完成,"
                                       f"等待时间: {sync_duration:.2f}s")

                        except threading.BrokenBarrierError:
                            logger.error(f"工作线程 {worker_id}: 阶段 {stage.value} 同步点损坏")
                            return
                        except Exception as e:
                            logger.error(f"工作线程 {worker_id}: 阶段 {stage.value} 同步异常: {e}")
                            return

                        logger.info(f"工作线程 {worker_id}: 阶段 {stage.value} 完成,"
                                  f"耗时: {stage_duration:.2f}s")

                except Exception as e:
                    logger.error(f"工作线程 {worker_id}: 管道执行异常: {e}")

                logger.info(f"ML管道工作线程 {worker_id}: 完成")

            def _execute_pipeline_stage(self, stage: PipelineStage, worker_id: int) -> Optional[PipelineMetrics]:
                """执行管道阶段"""
                stage_start = time.time()

                if stage == PipelineStage.DATA_INGESTION:
                    return self._execute_data_ingestion(worker_id)
                elif stage == PipelineStage.DATA_VALIDATION:
                    return self._execute_data_validation(worker_id)
                elif stage == PipelineStage.DATA_CLEANING:
                    return self._execute_data_cleaning(worker_id)
                elif stage == PipelineStage.FEATURE_ENGINEERING:
                    return self._execute_feature_engineering(worker_id)
                elif stage == PipelineStage.MODEL_TRAINING:
                    return self._execute_model_training(worker_id)
                elif stage == PipelineStage.MODEL_EVALUATION:
                    return self._execute_model_evaluation(worker_id)
                elif stage == PipelineStage.MODEL_DEPLOYMENT:
                    return self._execute_model_deployment(worker_id)
                else:
                    return None

            def _execute_data_ingestion(self, worker_id: int) -> PipelineMetrics:
                """执行数据接入阶段"""
                start_time = time.time()

                # 生成模拟数据
                data_records = []
                batch_size = random.randint(50, 150)

                for i in range(batch_size):
                    record = DataRecord(
                        id=worker_id * 1000 + i,
                        timestamp=time.time() + i,
                        feature1=random.uniform(0, 100),
                        feature2=random.uniform(-50, 50),
                        feature3=random.uniform(0, 1),
                        label=random.randint(0, 1) if random.random() < 0.8 else None
                    )
                    data_records.append(record)

                # 模拟数据接入延迟
                time.sleep(random.uniform(0.5, 2.0))

                # 存储数据
                with self.data_locks[PipelineStage.DATA_INGESTION]:
                    if PipelineStage.DATA_INGESTION not in self.stage_data:
                        self.stage_data[PipelineStage.DATA_INGESTION] = []
                    self.stage_data[PipelineStage.DATA_INGESTION].extend(data_records)

                end_time = time.time()
                duration = end_time - start_time
                processing_rate = batch_size / duration if duration > 0 else 0

                logger.debug(f"工作线程 {worker_id}: 数据接入完成,"
                           f"接入 {batch_size} 条记录,速率: {processing_rate:.1f} records/s")

                return PipelineMetrics(
                    stage=PipelineStage.DATA_INGESTION,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=0,
                    output_count=batch_size,
                    success_count=batch_size,
                    error_count=0,
                    processing_rate=processing_rate,
                    quality_score=1.0
                )

            def _execute_data_validation(self, worker_id: int) -> PipelineMetrics:
                """执行数据验证阶段"""
                start_time = time.time()

                # 获取前一阶段的数据
                with self.data_locks[PipelineStage.DATA_INGESTION]:
                    source_data = self.stage_data.get(PipelineStage.DATA_INGESTION, [])

                # 按工作线程划分数据
                worker_records = [
                    record for record in source_data
                    if record.id // 1000 == worker_id
                ]

                validated_records = []
                error_count = 0

                for record in worker_records:
                    # 验证数据质量
                    try:
                        # 检查特征值范围
                        if not (0 <= record.feature1 <= 100):
                            raise ValueError(f"feature1 超出范围: {record.feature1}")

                        if not (-50 <= record.feature2 <= 50):
                            raise ValueError(f"feature2 超出范围: {record.feature2}")

                        if not (0 <= record.feature3 <= 1):
                            raise ValueError(f"feature3 超出范围: {record.feature3}")

                        # 验证时间戳
                        if record.timestamp <= 0:
                            raise ValueError(f"无效时间戳: {record.timestamp}")

                        # 更新验证状态
                        record.validation_status = "validated"
                        validated_records.append(record)

                    except Exception as e:
                        record.validation_status = f"error: {str(e)}"
                        error_count += 1
                        logger.debug(f"工作线程 {worker_id}: 验证失败 {record.id}: {e}")

                # 模拟验证处理时间
                time.sleep(random.uniform(0.3, 1.5))

                # 存储验证结果
                with self.data_locks[PipelineStage.DATA_VALIDATION]:
                    if PipelineStage.DATA_VALIDATION not in self.stage_data:
                        self.stage_data[PipelineStage.DATA_VALIDATION] = []
                    self.stage_data[PipelineStage.DATA_VALIDATION].extend(validated_records)

                end_time = time.time()
                duration = end_time - start_time
                quality_score = len(validated_records) / len(worker_records) if worker_records else 0

                logger.debug(f"工作线程 {worker_id}: 数据验证完成,"
                           f"验证: {len(validated_records)}, 错误: {error_count}, "
                           f"质量评分: {quality_score:.3f}")

                return PipelineMetrics(
                    stage=PipelineStage.DATA_VALIDATION,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=len(worker_records),
                    output_count=len(validated_records),
                    success_count=len(validated_records),
                    error_count=error_count,
                    processing_rate=len(worker_records) / duration if duration > 0 else 0,
                    quality_score=quality_score
                )

            def _execute_data_cleaning(self, worker_id: int) -> PipelineMetrics:
                """执行数据清洗阶段"""
                start_time = time.time()

                # 获取验证后的数据
                with self.data_locks[PipelineStage.DATA_VALIDATION]:
                    source_data = self.stage_data.get(PipelineStage.DATA_VALIDATION, [])

                # 按工作线程划分数据
                worker_records = [
                    record for record in source_data
                    if record.id // 1000 == worker_id and
                    record.validation_status == "validated"
                ]

                cleaned_records = []
                error_count = 0

                for record in worker_records:
                    try:
                        # 数据清洗操作
                        # 1. 处理异常值
                        if record.feature1 > 90:
                            record.feature1 = 90  # 截断异常值

                        if record.feature2 < -40:
                            record.feature2 = -40

                        # 2. 标准化处理
                        record.feature1 = record.feature1 / 100.0  # 归一化到[0,1]
                        record.feature2 = (record.feature2 + 50) / 100.0  # 归一化到[0,1]

                        # 3. 特征工程预计算
                        record.feature4 = record.feature1 * record.feature2  # 交互特征
                        record.feature5 = abs(record.feature1 - record.feature3)  # 差值特征

                        # 更新清洗状态
                        record.cleaning_status = "cleaned"
                        cleaned_records.append(record)

                    except Exception as e:
                        record.cleaning_status = f"cleaning_error: {str(e)}"
                        error_count += 1

                # 模拟清洗处理时间
                time.sleep(random.uniform(0.2, 1.0))

                # 存储清洗结果
                with self.data_locks[PipelineStage.DATA_CLEANING]:
                    if PipelineStage.DATA_CLEANING not in self.stage_data:
                        self.stage_data[PipelineStage.DATA_CLEANING] = []
                    self.stage_data[PipelineStage.DATA_CLEANING].extend(cleaned_records)

                end_time = time.time()
                duration = end_time - start_time

                logger.debug(f"工作线程 {worker_id}: 数据清洗完成,"
                           f"清洗: {len(cleaned_records)}, 错误: {error_count}")

                return PipelineMetrics(
                    stage=PipelineStage.DATA_CLEANING,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=len(worker_records),
                    output_count=len(cleaned_records),
                    success_count=len(cleaned_records),
                    error_count=error_count,
                    processing_rate=len(worker_records) / duration if duration > 0 else 0,
                    quality_score=1.0
                )

            def _execute_feature_engineering(self, worker_id: int) -> PipelineMetrics:
                """执行特征工程阶段"""
                if not self.pipeline_config['feature_engineering_enabled']:
                    # 如果特征工程被禁用,直接传递清洗后的数据
                    start_time = time.time()

                    with self.data_locks[PipelineStage.DATA_CLEANING]:
                        source_data = self.stage_data.get(PipelineStage.DATA_CLEANING, [])

                    worker_records = [
                        record for record in source_data
                        if record.id // 1000 == worker_id and
                        record.cleaning_status == "cleaned"
                    ]

                    end_time = time.time()

                    return PipelineMetrics(
                        stage=PipelineStage.FEATURE_ENGINEERING,
                        worker_id=worker_id,
                        start_time=start_time,
                        end_time=end_time,
                        input_count=len(worker_records),
                        output_count=len(worker_records),
                        success_count=len(worker_records),
                        error_count=0,
                        processing_rate=len(worker_records) / (end_time - start_time) if end_time > start_time else 0,
                        quality_score=1.0
                    )

                start_time = time.time()

                # 获取清洗后的数据
                with self.data_locks[PipelineStage.DATA_CLEANING]:
                    source_data = self.stage_data.get(PipelineStage.DATA_CLEANING, [])

                # 按工作线程划分数据
                worker_records = [
                    record for record in source_data
                    if record.id // 1000 == worker_id and
                    record.cleaning_status == "cleaned"
                ]

                engineered_records = []

                for record in worker_records:
                    # 高级特征工程
                    # 1. 多项式特征
                    record.feature6 = record.feature1 ** 2
                    record.feature7 = record.feature2 ** 2
                    record.feature8 = record.feature1 * record.feature3

                    # 2. 统计特征
                    record.feature9 = (record.feature1 + record.feature2 + record.feature3) / 3
                    record.feature10 = (record.feature1 * record.feature2 * record.feature3) ** (1/3)

                    # 3. 分箱特征
                    if record.feature1 < 0.3:
                        record.feature11 = 0  # 低
                    elif record.feature1 < 0.7:
                        record.feature11 = 1  # 中
                    else:
                        record.feature11 = 2  # 高

                    engineered_records.append(record)

                # 模拟特征工程处理时间
                time.sleep(random.uniform(0.5, 2.0))

                # 存储特征工程结果
                with self.data_locks[PipelineStage.FEATURE_ENGINEERING]:
                    if PipelineStage.FEATURE_ENGINEERING not in self.stage_data:
                        self.stage_data[PipelineStage.FEATURE_ENGINEERING] = []
                    self.stage_data[PipelineStage.FEATURE_ENGINEERING].extend(engineered_records)

                end_time = time.time()
                duration = end_time - start_time

                logger.debug(f"工作线程 {worker_id}: 特征工程完成,"
                           f"处理: {len(engineered_records)} 条记录")

                return PipelineMetrics(
                    stage=PipelineStage.FEATURE_ENGINEERING,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=len(worker_records),
                    output_count=len(engineered_records),
                    success_count=len(engineered_records),
                    error_count=0,
                    processing_rate=len(worker_records) / duration if duration > 0 else 0,
                    quality_score=1.0
                )

            def _execute_model_training(self, worker_id: int) -> PipelineMetrics:
                """执行模型训练阶段"""
                start_time = time.time()

                # 获取特征工程后的数据
                with self.data_locks[PipelineStage.FEATURE_ENGINEERING]:
                    source_data = self.stage_data.get(PipelineStage.FEATURE_ENGINEERING, [])

                # 合并所有工作线程的数据用于训练
                all_records = source_data.copy()

                if worker_id == 1:  # 只有一个工作线程负责训练
                    # 数据准备
                    X = []
                    y = []

                    for record in all_records:
                        if record.label is not None:
                            features = [
                                record.feature1, record.feature2, record.feature3,
                                getattr(record, 'feature4', 0), getattr(record, 'feature5', 0),
                                getattr(record, 'feature6', 0), getattr(record, 'feature7', 0),
                                getattr(record, 'feature8', 0), getattr(record, 'feature9', 0),
                                getattr(record, 'feature10', 0), getattr(record, 'feature11', 0)
                            ]
                            X.append(features)
                            y.append(record.label)

                    # 模拟模型训练
                    epochs = self.pipeline_config['training_epochs']
                    batch_size = 32

                    for epoch in range(epochs):
                        # 模拟训练一个epoch
                        epoch_loss = random.uniform(0.1, 0.8)
                        accuracy = random.uniform(0.7, 0.95)

                        # 模拟训练时间
                        time.sleep(0.01)

                        if epoch % 20 == 0:
                            logger.debug(f"工作线程 {worker_id}: 训练 Epoch {epoch}, "
                                       f"Loss: {epoch_loss:.3f}, Accuracy: {accuracy:.3f}")

                    # 模拟模型保存
                    time.sleep(random.uniform(0.1, 0.5))

                    logger.info(f"工作线程 {worker_id}: 模型训练完成,"
                              f"训练样本数: {len(X)}, 最终准确率: {accuracy:.3f}")

                    success_count = len(X)
                else:
                    # 其他工作线程跳过训练
                    success_count = 0

                end_time = time.time()
                duration = end_time - start_time

                return PipelineMetrics(
                    stage=PipelineStage.MODEL_TRAINING,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=len(all_records),
                    output_count=1,  # 输出一个模型
                    success_count=success_count,
                    error_count=0,
                    processing_rate=len(all_records) / duration if duration > 0 else 0,
                    quality_score=accuracy if worker_id == 1 else 1.0
                )

            def _execute_model_evaluation(self, worker_id: int) -> PipelineMetrics:
                """执行模型评估阶段"""
                start_time = time.time()

                # 获取特征工程后的数据
                with self.data_locks[PipelineStage.FEATURE_ENGINEERING]:
                    source_data = self.stage_data.get(PipelineStage.FEATURE_ENGINEERING, [])

                if worker_id == 1:  # 只有一个工作线程负责评估
                    # 数据准备(使用不同的评估集)
                    eval_records = source_data[len(source_data)//2:]  # 使用后半部分作为评估集

                    if eval_records:
                        # 模拟模型预测
                        correct_predictions = 0
                        total_predictions = len(eval_records)

                        for record in eval_records:
                            if record.label is not None:
                                # 模拟模型预测
                                prediction_prob = random.uniform(0, 1)
                                prediction = 1 if prediction_prob > 0.5 else 0
                                record.prediction = prediction

                                if prediction == record.label:
                                    correct_predictions += 1

                        # 计算评估指标
                        accuracy = correct_predictions / total_predictions if total_predictions > 0 else 0
                        precision = accuracy * random.uniform(0.9, 1.1)
                        recall = accuracy * random.uniform(0.85, 1.15)
                        f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

                        logger.info(f"工作线程 {worker_id}: 模型评估完成,"
                                  f"准确率: {accuracy:.3f}, 精确率: {precision:.3f}, "
                                  f"召回率: {recall:.3f}, F1: {f1_score:.3f}")
                    else:
                        accuracy = 0
                        total_predictions = 0

                    # 模拟评估报告生成
                    time.sleep(random.uniform(0.2, 1.0))

                    success_count = total_predictions
                else:
                    success_count = 0

                end_time = time.time()
                duration = end_time - start_time

                return PipelineMetrics(
                    stage=PipelineStage.MODEL_EVALUATION,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=len(source_data),
                    output_count=1,  # 输出一个评估报告
                    success_count=success_count,
                    error_count=0,
                    processing_rate=len(source_data) / duration if duration > 0 else 0,
                    quality_score=accuracy if worker_id == 1 else 1.0
                )

            def _execute_model_deployment(self, worker_id: int) -> PipelineMetrics:
                """执行模型部署阶段"""
                start_time = time.time()

                if worker_id == 1:  # 只有一个工作线程负责部署
                    # 模拟部署过程
                    deployment_strategy = self.pipeline_config['deployment_strategy']

                    logger.info(f"工作线程 {worker_id}: 开始模型部署,策略: {deployment_strategy}")

                    # 模拟部署步骤
                    steps = [
                        "环境准备",
                        "模型加载",
                        "API接口创建",
                        "健康检查",
                        "流量切换"
                    ]

                    for step in steps:
                        step_time = random.uniform(0.1, 0.5)
                        time.sleep(step_time)
                        logger.debug(f"工作线程 {worker_id}: 部署步骤完成: {step}")

                    # 模拟部署验证
                    time.sleep(random.uniform(0.2, 1.0))

                    success_count = 1
                    logger.info(f"工作线程 {worker_id}: 模型部署完成")
                else:
                    success_count = 0

                end_time = time.time()
                duration = end_time - start_time

                return PipelineMetrics(
                    stage=PipelineStage.MODEL_DEPLOYMENT,
                    worker_id=worker_id,
                    start_time=start_time,
                    end_time=end_time,
                    input_count=1,  # 输入一个模型
                    output_count=1,  # 输出一个部署
                    success_count=success_count,
                    error_count=0,
                    processing_rate=1 / duration if duration > 0 else 0,
                    quality_score=1.0
                )

            def run_ml_pipeline_demo(self):
                """运行机器学习管道演示"""
                logger.info("=== 机器学习管道演示 ===")
                logger.info(f"工作线程数: {self.num_workers}")
                logger.info(f"管道阶段: {[stage.value for stage in self.stages]}")
                logger.info(f"管道配置: {self.pipeline_config}")

                threads = []
                demo_start = time.time()

                # 创建并启动ML管道工作线程
                for i in range(self.num_workers):
                    thread = threading.Thread(
                        target=self.ml_pipeline_worker,
                        args=(i + 1,),
                        name=f"MLPipelineWorker-{i + 1}"
                    )
                    threads.append(thread)
                    thread.start()
                    time.sleep(0.1)

                # 等待所有线程完成
                for thread in threads:
                    thread.join(timeout=120)
                    if thread.is_alive():
                        logger.warning(f"线程 {thread.name} 未能在超时内完成")

                demo_time = time.time() - demo_start
                self._analyze_pipeline_results(demo_time)

            def _analyze_pipeline_results(self, total_demo_time: float):
                """分析管道执行结果"""
                logger.info("\n=== 机器学习管道结果分析 ===")
                logger.info(f"管道总执行时间: {total_demo_time:.2f}秒")

                # 按阶段分析结果
                for stage in self.stages:
                    stage_metrics = [
                        metric for metric in self.pipeline_metrics
                        if metric.stage == stage
                    ]

                    if not stage_metrics:
                        continue

                    total_input = sum(metric.input_count for metric in stage_metrics)
                    total_output = sum(metric.output_count for metric in stage_metrics)
                    total_success = sum(metric.success_count for metric in stage_metrics)
                    total_errors = sum(metric.error_count for metric in stage_metrics)

                    avg_quality = sum(metric.quality_score for metric in stage_metrics) / len(stage_metrics)
                    total_processing_rate = sum(metric.processing_rate for metric in stage_metrics)

                    logger.info(f"\n{stage.value}:")
                    logger.info(f"  输入记录数: {total_input}")
                    logger.info(f"  输出记录数: {total_output}")
                    logger.info(f"  成功处理数: {total_success}")
                    logger.info(f"  错误数: {total_errors}")
                    logger.info(f"  成功率: {(total_success/total_input*100):.1f}%" if total_input > 0 else "  成功率: 0%")
                    logger.info(f"  平均质量评分: {avg_quality:.3f}")
                    logger.info(f"  总处理速率: {total_processing_rate:.1f} records/s")

                # 数据流分析
                logger.info(f"\n=== 数据流分析 ===")
                for stage in self.stages:
                    with self.data_locks[stage]:
                        data_count = len(self.stage_data.get(stage, []))
                        logger.info(f"{stage.value}: {data_count} 条记录")

                # 整体效率分析
                if self.pipeline_metrics:
                    avg_quality = sum(metric.quality_score for metric in self.pipeline_metrics) / len(self.pipeline_metrics)
                    total_records_processed = sum(metric.output_count for metric in self.pipeline_metrics)
                    overall_throughput = total_records_processed / total_demo_time

                    logger.info(f"\n=== 整体效率分析 ===")
                    logger.info(f"平均质量评分: {avg_quality:.3f}")
                    logger.info(f"总处理记录数: {total_records_processed}")
                    logger.info(f"整体吞吐量: {overall_throughput:.1f} records/s")
                    logger.info(f"阶段效率: {len(self.stages) / total_demo_time:.2f} stages/s")

        # 使用示例
        if __name__ == "__main__":
            ml_pipeline = MLOpsPipeline(num_workers=4)
            ml_pipeline.run_ml_pipeline_demo()
        ---

6.4 回调函数

01.回调函数基本概念
    a.定义与作用
        回调函数是在特定事件发生时自动执行的函数,在屏障同步中用于在线程全部到达时触发预定义操作。
    b.主要特点
        a.自动触发
            当满足条件(如线程全部到达)时自动执行,无需手动调用。
        b.线程安全
            回调函数在屏障内部线程安全的环境中执行。
        c.单次执行
            每轮同步只执行一次,避免重复触发。
        c.应用场景
        a.进度监控
            在每个同步点记录任务进度和系统状态。
        b.资源管理
            自动清理临时资源或重置状态。
        c.数据验证
            在同步点验证数据完整性和一致性。

02.标准库Barrier回调
    a.action参数
        Python threading.Barrier 的 action 参数接受一个可调用对象,在所有线程到达时执行。
    b.实现方式
        ---
        # 标准库回调示例
        import threading
        import time
        import logging

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        def progress_callback():
            """进度回调函数"""
            logger.info("=== 所有线程已到达同步点 ===")
            logger.info("🔄 准备进入下一阶段...")
            return "同步完成"

        # 创建带回调的屏障
        barrier = threading.Barrier(parties=3, action=progress_callback)

        def callback_worker(worker_id):
            """使用回调的工作线程"""
            for phase in range(1, 4):
                logger.info(f"Worker-{worker_id} 完成阶段 {phase}")
                barrier.wait()  # 触发回调
                logger.info(f"Worker-{worker_id} 进入下一阶段")

        # 启动线程
        for i in range(1, 4):
            thread = threading.Thread(target=callback_worker, args=(i,))
            thread.start()
        ---
    c.注意事项
        a.回调异常处理
            回调函数中的异常不会影响屏障的正常工作,但会被记录到日志中。
        b.执行顺序
            回调在最后一个到达的线程中执行,其他线程等待回调完成。
        c.性能影响
            回调函数的执行时间会影响所有线程的同步时间。

03.自定义回调实现
    a.增强回调功能
        扩展标准库功能,提供更丰富的回调管理能力。
    b.核心实现
        ---
        # 自定义回调屏障实现
        import threading
        import time
        import logging
        from datetime import datetime
        from typing import Optional, Callable, Any, List

        class EnhancedCallbackBarrier:
            """增强回调屏障"""
            def __init__(self, parties: int, action: Optional[Callable] = None,
                        timeout: Optional[float] = None, name: str = "Barrier"):
                self.parties = parties
                self.action = action
                self.timeout = timeout
                self.name = name

                self._condition = threading.Condition()
                self._waiting = 0
                self._generation = 0
                self._broken = False
                self._callback_history: List[Dict] = []

            def wait(self, timeout: Optional[float] = None) -> bool:
                """等待并触发回调"""
                actual_timeout = timeout if timeout is not None else self.timeout

                with self._condition:
                    if self._broken:
                        raise threading.BrokenBarrierError(f"屏障 '{self.name}' 已损坏")

                    self._waiting += 1
                    current_generation = self._generation

                    if self._waiting == self.parties:
                        # 执行回调
                        if self.action:
                            self._execute_callback()

                        # 重置屏障
                        self._generation += 1
                        self._waiting = 0
                        self._condition.notify_all()
                        return True

                    # 等待其他线程
                    start_time = time.time()
                    while True:
                        if self._generation != current_generation:
                            return True

                        if self._broken:
                            raise threading.BrokenBarrierError(f"屏障 '{self.name}' 已损坏")

                        if actual_timeout:
                            remaining = actual_timeout - (time.time() - start_time)
                            if remaining <= 0:
                                self._break_barrier()
                                return False
                            self._condition.wait(remaining)
                        else:
                            self._condition.wait()

            def _execute_callback(self):
                """执行回调函数"""
                try:
                    start_time = time.time()
                    result = self.action()
                    execution_time = time.time() - start_time

                    # 记录执行历史
                    callback_info = {
                        'generation': self._generation,
                        'timestamp': datetime.now().isoformat(),
                        'execution_time': execution_time,
                        'result': result,
                        'success': True
                    }
                    self._callback_history.append(callback_info)

                except Exception as e:
                    # 记录错误但不中断屏障
                    callback_info = {
                        'generation': self._generation,
                        'timestamp': datetime.now().isoformat(),
                        'error': str(e),
                        'success': False
                    }
                    self._callback_history.append(callback_info)

            def get_callback_history(self) -> List[Dict]:
                """获取回调执行历史"""
                return self._callback_history.copy()

            def _break_barrier(self):
                """损坏屏障"""
                with self._condition:
                    self._broken = True
                    self._condition.notify_all()
        ---
    c.高级特性
        a.回调历史记录
            记录每次回调的执行时间、结果和错误信息。
        b.错误容错
            回调异常不会影响屏障的正常工作。
        c.性能监控
            统计回调执行时间和成功率。

04.常用回调模式
    a.进度监控回调
        a.功能描述
            实时跟踪任务进度和系统状态。
        b.代码示例
            ---
            class ProgressMonitor:
                """进度监控回调类"""
                def __init__(self, task_name: str, total_phases: int):
                    self.task_name = task_name
                    self.total_phases = total_phases
                    self.current_phase = 0
                    self.start_time = time.time()

                def __call__(self):
                    """回调函数"""
                    self.current_phase += 1
                    elapsed = time.time() - self.start_time
                    progress = (self.current_phase / self.total_phases) * 100

                    logger.info(f"📊 {self.task_name} 进度: {progress:.1f}%")
                    logger.info(f"阶段: {self.current_phase}/{self.total_phases}")
                    logger.info(f"已用时间: {elapsed:.2f}s")

                    return {
                        'phase': self.current_phase,
                        'progress': progress,
                        'elapsed': elapsed
                    }

            # 使用示例
            monitor = ProgressMonitor("数据处理任务", 5)
            barrier = EnhancedCallbackBarrier(parties=3, action=monitor)
            ---
    b.资源清理回调
        a.功能描述
            在同步点自动清理临时资源。
        b.代码示例
            ---
            class ResourceCleanup:
                """资源清理回调类"""
                def __init__(self):
                    self.cleanup_actions = []

                def add_cleanup(self, action: Callable, description: str):
                    """添加清理动作"""
                    self.cleanup_actions.append((action, description))

                def __call__(self):
                    """执行所有清理动作"""
                    logger.info("🧹 开始资源清理")
                    results = []

                    for i, (action, desc) in enumerate(self.cleanup_actions, 1):
                        try:
                            start = time.time()
                            result = action()
                            duration = time.time() - start
                            logger.info(f"✓ {desc} (耗时: {duration:.3f}s)")
                            results.append((desc, True, duration))
                        except Exception as e:
                            logger.error(f"✗ {desc}: {e}")
                            results.append((desc, False, 0))

                    success_count = sum(1 for _, success, _ in results if success)
                    logger.info(f"清理完成: {success_count}/{len(results)} 成功")
                    return results

            # 使用示例
            cleanup = ResourceCleanup()
            cleanup.add_cleanup(lambda: temp_files.clear(), "清理临时文件")
            cleanup.add_cleanup(lambda: database.close_connections(), "关闭数据库连接")
            barrier = EnhancedCallbackBarrier(parties=3, action=cleanup)
            ---
    c.状态验证回调
        a.功能描述
            在同步点验证数据状态和一致性。
        b.代码示例
            ---
            class StateValidator:
                """状态验证回调类"""
                def __init__(self):
                    self.validation_rules = []

                def add_rule(self, rule_func: Callable, description: str):
                    """添加验证规则"""
                    self.validation_rules.append((rule_func, description))

                def __call__(self):
                    """执行所有验证规则"""
                    logger.info("🔍 开始状态验证")
                    results = []

                    for rule_func, desc in self.validation_rules:
                        try:
                            result = rule_func()
                            if result:
                                logger.info(f"✓ {desc}: 验证通过")
                                results.append((desc, True))
                            else:
                                logger.warning(f"⚠ {desc}: 验证失败")
                                results.append((desc, False))
                        except Exception as e:
                            logger.error(f"✗ {desc}: 验证异常 - {e}")
                            results.append((desc, False))

                    passed = sum(1 for _, success in results if success)
                    logger.info(f"验证完成: {passed}/{len(results)} 通过")

                    # 如果所有验证都通过返回True,否则返回False
                    return all(success for _, success in results)

            # 使用示例
            validator = StateValidator()
            validator.add_rule(lambda: check_data_consistency(), "数据一致性检查")
            validator.add_rule(lambda: validate_results(), "结果有效性验证")
            barrier = EnhancedCallbackBarrier(parties=3, action=validator)
            ---

05.回调最佳实践
    a.设计原则
        a.保持简短
            回调函数应尽可能简短,避免长时间阻塞其他线程。
        b.异常安全
            确保回调函数中的异常不会影响系统稳定性。
        c.幂等性
            回调函数应该是幂等的,多次执行产生相同结果。
    b.性能优化
        a.异步执行
            对于耗时操作,考虑在回调中启动异步任务。
        b.批量操作
            将多个小操作合并为批量操作,减少开销。
        c.缓存机制
            对重复计算的结果进行缓存。
    c.错误处理
        a.优雅降级
            回调失败时提供备用方案。
        b.详细日志
            记录回调执行过程中的详细信息。
        c.状态恢复
            在异常情况下能够恢复到一致状态。

7. 进程锁

7.1 multiprocessing.Lock

01.基本概念
    a.定义与作用
        multiprocessing.Lock 是多进程环境下的同步原语,用于保护共享资源免受并发访问的影响。
    b.主要特点
        a.进程安全
            确保在多进程环境中的互斥访问。
        b.跨进程继承
            可以在父子进程间传递和共享。
        c.阻塞机制
            提供阻塞和非阻塞两种获取方式。
    c.与threading.Lock的区别
        a.适用环境
            multiprocessing.Lock 用于进程间同步,threading.Lock 用于线程间同步。
        b.实现机制
            前者基于操作系统级别的进程同步,后者基于线程同步。
        c.性能开销
            进程锁的开销通常大于线程锁。

02.基本使用
    a.创建锁
        直接实例化 multiprocessing.Lock() 对象。
    b.获取和释放
        使用 acquire() 和 release() 方法。
    c.上下文管理
        推荐 with 语句进行自动管理。
    d.代码示例
        ---
        # multiprocessing.Lock 基本使用示例
        import multiprocessing
        import time
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        # 共享资源
        shared_counter = multiprocessing.Value('i', 0)
        shared_list = multiprocessing.Manager().list()

        def worker_with_lock(lock, worker_id, iterations=5):
            """使用锁的工作进程"""
            for i in range(iterations):
                try:
                    # 获取锁
                    lock.acquire()

                    # 临界区代码
                    current_value = shared_counter.value
                    time.sleep(0.1)  # 模拟耗时操作
                    shared_counter.value = current_value + 1

                    # 记录操作
                    operation = f"Worker-{worker_id}: 操作 {i+1}, 值: {current_value} -> {shared_counter.value}"
                    shared_list.append(operation)
                    logger.info(operation)

                finally:
                    # 释放锁
                    lock.release()

                time.sleep(0.05)  # 非临界区工作

        def worker_with_context(lock, worker_id, iterations=5):
            """使用上下文管理器的工作进程"""
            for i in range(iterations):
                with lock:  # 自动获取和释放锁
                    current_value = shared_counter.value
                    time.sleep(0.1)
                    shared_counter.value = current_value + 1

                    operation = f"Context-Worker-{worker_id}: 操作 {i+1}, 值: {shared_counter.value}"
                    shared_list.append(operation)
                    logger.info(operation)

                time.sleep(0.05)

        def main():
            # 创建锁
            process_lock = multiprocessing.Lock()

            # 创建进程池
            processes = []

            # 启动使用传统方式的进程
            for i in range(1, 4):
                p = multiprocessing.Process(
                    target=worker_with_lock,
                    args=(process_lock, i, 3)
                )
                processes.append(p)
                p.start()

            # 启动使用上下文管理器的进程
            for i in range(1, 4):
                p = multiprocessing.Process(
                    target=worker_with_context,
                    args=(process_lock, i, 3)
                )
                processes.append(p)
                p.start()

            # 等待所有进程完成
            for p in processes:
                p.join()

            # 输出结果
            logger.info(f"最终计数器值: {shared_counter.value}")
            logger.info(f"总操作记录数: {len(shared_list)}")

            return shared_counter.value, len(shared_list)

        if __name__ == "__main__":
            result = main()
        ---

03.高级特性
    a.阻塞与非阻塞获取
        a.acquire() 方法
            默认阻塞获取,直到获得锁为止。
        b.acquire(timeout) 方法
            设置超时时间,避免无限等待。
        c.acquire(blocking=False) 方法
            非阻塞获取,立即返回结果。
    b.锁状态查询
        a.locked() 方法
            检查锁是否被占用。
        b.调试支持
            提供锁的状态信息用于调试。
    c.代码示例
        ---
        # 高级锁使用示例
        import multiprocessing
        import time
        import random
        from datetime import datetime

        class AdvancedLockExample:
            def __init__(self):
                self.lock = multiprocessing.Lock()
                self.shared_data = multiprocessing.Manager().dict()
                self.access_log = multiprocessing.Manager().list()

            def blocking_worker(self, worker_id):
                """阻塞获取锁的工作进程"""
                logger.info(f"Worker-{worker_id}: 尝试获取锁(阻塞模式)")

                # 阻塞获取锁
                self.lock.acquire()
                try:
                    logger.info(f"Worker-{worker_id}: 成功获取锁")

                    # 模拟长时间工作
                    work_time = random.uniform(0.5, 1.5)
                    time.sleep(work_time)

                    # 更新共享数据
                    timestamp = datetime.now().isoformat()
                    self.shared_data[f'worker_{worker_id}'] = {
                        'start_time': timestamp,
                        'work_duration': work_time,
                        'status': 'completed'
                    }

                    logger.info(f"Worker-{worker_id}: 完成工作,耗时 {work_time:.2f}s")

                finally:
                    self.lock.release()
                    logger.info(f"Worker-{worker_id}: 释放锁")

            def timeout_worker(self, worker_id, timeout=1.0):
                """使用超时的非阻塞工作进程"""
                logger.info(f"Worker-{worker_id}: 尝试获取锁(超时: {timeout}s)")

                # 尝试获取锁,带超时
                acquired = self.lock.acquire(timeout=timeout)

                if acquired:
                    try:
                        logger.info(f"Worker-{worker_id}: 成功获取锁")

                        # 短时间工作
                        work_time = random.uniform(0.2, 0.5)
                        time.sleep(work_time)

                        timestamp = datetime.now().isoformat()
                        self.shared_data[f'timeout_worker_{worker_id}'] = {
                            'start_time': timestamp,
                            'work_duration': work_time,
                            'acquisition_method': 'timeout'
                        }

                        logger.info(f"Worker-{worker_id}: 完成工作,耗时 {work_time:.2f}s")

                    finally:
                        self.lock.release()
                        logger.info(f"Worker-{worker_id}: 释放锁")
                else:
                    logger.warning(f"Worker-{worker_id}: 获取锁超时,跳过工作")

                    # 记录超时信息
                    self.access_log.append({
                        'worker_id': worker_id,
                        'event': 'timeout',
                        'timestamp': datetime.now().isoformat()
                    })

            def nonblocking_worker(self, worker_id):
                """非阻塞获取锁的工作进程"""
                logger.info(f"Worker-{worker_id}: 尝试获取锁(非阻塞模式)")

                # 非阻塞获取锁
                acquired = self.lock.acquire(blocking=False)

                if acquired:
                    try:
                        logger.info(f"Worker-{worker_id}: 立即获取锁成功")

                        # 极短时间工作
                        work_time = random.uniform(0.1, 0.3)
                        time.sleep(work_time)

                        timestamp = datetime.now().isoformat()
                        self.shared_data[f'nonblocking_worker_{worker_id}'] = {
                            'start_time': timestamp,
                            'work_duration': work_time,
                            'acquisition_method': 'nonblocking'
                        }

                        logger.info(f"Worker-{worker_id}: 完成工作,耗时 {work_time:.2f}s")

                    finally:
                        self.lock.release()
                        logger.info(f"Worker-{worker_id}: 释放锁")
                else:
                    logger.info(f"Worker-{worker_id}: 锁被占用,立即放弃")

                    # 记录未获取信息
                    self.access_log.append({
                        'worker_id': worker_id,
                        'event': 'lock_busy',
                        'timestamp': datetime.now().isoformat()
                    })

            def monitor_lock_status(self):
                """监控锁状态"""
                status = "locked" if self.lock.locked() else "unlocked"
                logger.info(f"锁状态: {status}")
                return self.lock.locked()

        def run_advanced_example():
            """运行高级锁示例"""
            example = AdvancedLockExample()

            processes = []

            # 启动阻塞工作进程
            for i in range(1, 3):
                p = multiprocessing.Process(
                    target=example.blocking_worker,
                    args=(i,)
                )
                processes.append(p)
                p.start()

            # 启动超时工作进程
            for i in range(1, 4):
                p = multiprocessing.Process(
                    target=example.timeout_worker,
                    args=(i, 1.0)
                )
                processes.append(p)
                p.start()
                time.sleep(0.2)  # 错开启动时间

            # 启动非阻塞工作进程
            for i in range(1, 4):
                p = multiprocessing.Process(
                    target=example.nonblocking_worker,
                    args=(i,)
                )
                processes.append(p)
                p.start()
                time.sleep(0.1)

            # 监控锁状态
            monitor_process = multiprocessing.Process(
                target=lambda: [
                    example.monitor_lock_status() or time.sleep(0.3)
                    for _ in range(10)
                ]
            )
            monitor_process.start()
            processes.append(monitor_process)

            # 等待所有进程完成
            for p in processes:
                p.join()

            # 输出结果
            logger.info(f"共享数据条目数: {len(example.shared_data)}")
            logger.info(f"访问日志条目数: {len(example.access_log)}")

            return example.shared_data, example.access_log

        if __name__ == "__main__":
            shared_data, access_log = run_advanced_example()
        ---

04.实际应用场景
    a.文件操作同步
        a.场景描述
            多个进程同时写入同一文件时的同步保护。
        b.实现方案
            使用锁保护文件的读写操作。
        c.代码示例
            ---
            class FileWriteSync:
                def __init__(self, filename):
                    self.filename = filename
                    self.lock = multiprocessing.Lock()
                    self.write_count = multiprocessing.Value('i', 0)

                def safe_write(self, content, process_id):
                    """安全的文件写入"""
                    with self.lock:
                        try:
                            with open(self.filename, 'a', encoding='utf-8') as f:
                                timestamp = datetime.now().isoformat()
                                line = f"{timestamp} - Process-{process_id}: {content}\n"
                                f.write(line)
                                self.write_count.value += 1

                                logger.info(f"Process-{process_id}: 写入完成")

                        except Exception as e:
                            logger.error(f"Process-{process_id}: 写入失败 - {e}")

            # 使用示例
            file_sync = FileWriteSync("shared_output.txt")
            ---

    b.数据库访问同步
        a.场景描述
            多进程访问数据库时的连接和操作同步。
        b.实现方案
            使用锁保护数据库连接池和事务操作。
        c.代码示例
            ---
            class DatabaseAccessSync:
                def __init__(self):
                    self.lock = multiprocessing.Lock()
                    self.connection_count = multiprocessing.Value('i', 0)
                    self.active_connections = multiprocessing.Manager().list()

                def get_connection(self, process_id):
                    """获取数据库连接"""
                    with self.lock:
                        # 模拟连接创建
                        conn_id = f"conn_{process_id}_{self.connection_count.value}"
                        self.connection_count.value += 1
                        self.active_connections.append(conn_id)

                        logger.info(f"Process-{process_id}: 创建连接 {conn_id}")
                        return f"mock_connection_{conn_id}"

                def execute_query(self, connection, query, process_id):
                    """执行查询"""
                    with self.lock:
                        # 模拟查询执行
                        logger.info(f"Process-{process_id}: 执行查询 - {query}")
                        time.sleep(random.uniform(0.1, 0.3))

                        return f"query_result_{process_id}"

                def close_connection(self, connection, process_id):
                    """关闭连接"""
                    with self.lock:
                        # 模拟连接关闭
                        conn_id = connection.split('_')[-1]
                        if conn_id in self.active_connections:
                            self.active_connections.remove(conn_id)

                        logger.info(f"Process-{process_id}: 关闭连接 {conn_id}")
            ---

    c.资源池管理
        a.场景描述
            管理有限资源(如内存、连接、句柄)的分配。
        b.实现方案
            使用锁保护资源池的分配和回收。
        c.代码示例
            ---
            class ResourcePool:
                def __init__(self, max_resources):
                    self.max_resources = max_resources
                    self.available_resources = multiprocessing.Manager().list(
                        [f"resource_{i}" for i in range(max_resources)]
                    )
                    self.allocated_resources = multiprocessing.Manager().dict()
                    self.lock = multiprocessing.Lock()
                    self.allocation_count = multiprocessing.Value('i', 0)

                def allocate(self, process_id):
                    """分配资源"""
                    with self.lock:
                        if self.available_resources:
                            resource = self.available_resources.pop(0)
                            self.allocated_resources[process_id] = resource
                            self.allocation_count.value += 1

                            logger.info(f"Process-{process_id}: 分配资源 {resource}")
                            return resource
                        else:
                            logger.warning(f"Process-{process_id}: 资源池已空")
                            return None

                def release(self, process_id):
                    """释放资源"""
                    with self.lock:
                        if process_id in self.allocated_resources:
                            resource = self.allocated_resources.pop(process_id)
                            self.available_resources.append(resource)

                            logger.info(f"Process-{process_id}: 释放资源 {resource}")
                            return True
                        else:
                            logger.warning(f"Process-{process_id}: 无资源可释放")
                            return False
            ---

05.最佳实践与注意事项
    a.使用原则
        a.最小锁范围
            只在必要的临界区使用锁,避免过度锁定。
        b.避免死锁
            遵循统一的锁获取顺序,避免循环等待。
        c.异常安全
            确保在异常情况下锁能被正确释放。
    b.性能优化
        a.减少锁竞争
            通过算法设计减少对共享资源的访问。
        b.使用读写锁
            读多写少场景考虑使用读写分离策略。
        c.批量操作
            将多个小操作合并为一次锁保护的大操作。
    c.调试技巧
        a.锁状态监控
            定期检查锁的状态和持有时间。
        b.超时设置
            为锁获取设置合理的超时时间。
        c.详细日志
            记录锁的获取、释放和等待情况。

7.2 multiprocessing.RLock

01.基本概念
    a.定义与作用
        multiprocessing.RLock 是可重入锁(Reentrant Lock),允许同一个进程多次获取同一锁。
    b.核心特性
        a.可重入性
            同一进程可以多次获取同一个锁而不被阻塞。
        b.计数机制
            内部维护获取计数器,只有计数器归零时才真正释放锁。
        c.进程归属
            锁被特定进程持有,其他进程无法获取。
    c.与普通锁的区别
        a.获取机制
            RLock 支持同进程重复获取,Lock 不支持。
        b.释放要求
            RLock 需要与获取次数相同的释放,Lock 只需一次。
        c.使用场景
            RLock 适用于递归调用,Lock 适用于简单互斥。

02.基本使用
    a.创建可重入锁
        使用 multiprocessing.RLock() 实例化对象。
    b.获取与释放
        支持多次 acquire() 和对应的 release()。
    c.递归场景
        特别适用于递归函数和嵌套调用。
    d.代码示例
        ---
        # multiprocessing.RLock 基本使用示例
        import multiprocessing
        import time
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        # 共享资源
        shared_resource = multiprocessing.Manager().dict()
        operation_count = multiprocessing.Value('i', 0)

        class RecursiveWorker:
            def __init__(self, rlock, process_id):
                self.rlock = rlock
                self.process_id = process_id
                self.nesting_level = 0

            def nested_operation(self, level, max_level=3):
                """嵌套操作演示可重入性"""
                with self.rlock:
                    self.nesting_level = level
                    timestamp = datetime.now().isoformat()

                    # 记录当前嵌套层级
                    operation_info = {
                        'process_id': self.process_id,
                        'level': level,
                        'timestamp': timestamp,
                        'operation_type': 'nested_operation'
                    }

                    key = f"process_{self.process_id}_level_{level}"
                    shared_resource[key] = operation_info
                    operation_count.value += 1

                    logger.info(f"Process-{self.process_id}: 嵌套操作级别 {level}/{max_level}")

                    if level < max_level:
                        # 递归调用,再次获取同一锁
                        time.sleep(0.1)
                        self.nested_operation(level + 1, max_level)
                    else:
                        logger.info(f"Process-{self.process_id}: 达到最大嵌套级别 {max_level}")

            def complex_task(self):
                """复杂任务,包含多个锁获取"""
                logger.info(f"Process-{self.process_id}: 开始复杂任务")

                # 第一次获取锁
                with self.rlock:
                    logger.info(f"Process-{self.process_id}: 第一次获取锁")

                    # 执行第一部分工作
                    self._part1_work()

                    # 执行需要锁的嵌套操作
                    self.nested_operation(1)

                    # 执行第二部分工作
                    self._part2_work()

            def _part1_work(self):
                """第一部分工作"""
                with self.rlock:  # 第二次获取锁
                    logger.info(f"Process-{self.process_id}: 执行第一部分工作")
                    time.sleep(0.2)

                    # 记录部分1完成
                    shared_resource[f"{self.process_id}_part1"] = {
                        'status': 'completed',
                        'timestamp': datetime.now().isoformat()
                    }

            def _part2_work(self):
                """第二部分工作"""
                with self.rlock:  # 第三次获取锁
                    logger.info(f"Process-{self.process_id}: 执行第二部分工作")
                    time.sleep(0.2)

                    # 记录部分2完成
                    shared_resource[f"{self.process_id}_part2"] = {
                        'status': 'completed',
                        'timestamp': datetime.now().isoformat()
                    }

        def worker_process(rlock, process_id):
            """工作进程"""
            worker = RecursiveWorker(rlock, process_id)
            worker.complex_task()

            logger.info(f"Process-{process_id}: 工作完成")

        def main():
            # 创建可重入锁
            rlock = multiprocessing.RLock()

            # 创建并启动进程
            processes = []
            for i in range(1, 4):
                p = multiprocessing.Process(
                    target=worker_process,
                    args=(rlock, i)
                )
                processes.append(p)
                p.start()

            # 等待所有进程完成
            for p in processes:
                p.join()

            # 输出结果
            logger.info(f"总操作次数: {operation_count.value}")
            logger.info(f"共享资源条目数: {len(shared_resource)}")

            # 显示详细结果
            for key, value in shared_resource.items():
                logger.info(f"{key}: {value}")

            return operation_count.value, len(shared_resource)

        if __name__ == "__main__":
            result = main()
        ---

03.高级特性与应用
    a.锁获取计数
        a.内部机制
            维护获取计数,支持同进程多次获取。
        b.状态查询
            可以通过调试方式查看锁的获取状态。
        c.调试支持
            提供锁的获取信息用于问题诊断。
    b.递归场景优化
        a.避免死锁
            在递归调用中避免因锁竞争导致的死锁。
        b.性能考虑
            相比普通锁有轻微的性能开销。
        c.代码简化
            减少复杂的锁传递逻辑。
    c.代码示例
        ---
        # RLock 高级应用示例
        import multiprocessing
        import time
        import random
        from datetime import datetime
        from typing import Dict, Any

        class AdvancedRLockUsage:
            def __init__(self):
                self.rlock = multiprocessing.RLock()
                self.shared_data = multiprocessing.Manager().dict()
                self.lock_statistics = multiprocessing.Manager().dict()
                self.operation_log = multiprocessing.Manager().list()

            def recursive_file_processor(self, file_path, depth=0, max_depth=3):
                """递归文件处理器"""
                if depth > max_depth:
                    return

                with self.rlock:
                    timestamp = datetime.now().isoformat()
                    operation_id = f"file_{file_path}_{depth}_{timestamp}"

                    # 记录操作
                    operation_info = {
                        'file_path': file_path,
                        'depth': depth,
                        'timestamp': timestamp,
                        'status': 'processing'
                    }

                    self.shared_data[operation_id] = operation_info
                    self.operation_log.append(operation_info)

                    logger.info(f"处理文件 {file_path} (深度: {depth})")

                    # 模拟文件处理时间
                    processing_time = random.uniform(0.1, 0.3)
                    time.sleep(processing_time)

                    # 更新统计信息
                    self._update_statistics(file_path, depth, processing_time)

                    # 如果还有子文件需要处理(模拟递归)
                    if depth < max_depth:
                        sub_files = [f"{file_path}_sub{i}" for i in range(1, 3)]
                        for sub_file in sub_files:
                            self.recursive_file_processor(sub_file, depth + 1, max_depth)

                    # 更新完成状态
                    operation_info['status'] = 'completed'
                    self.shared_data[operation_id] = operation_info

            def _update_statistics(self, file_path, depth, processing_time):
                """更新统计信息(需要锁保护)"""
                with self.rlock:  # 嵌套获取锁
                    stats_key = f"stats_{file_path}"
                    if stats_key not in self.lock_statistics:
                        self.lock_statistics[stats_key] = {
                            'total_processing_time': 0,
                            'process_count': 0,
                            'max_depth': 0,
                            'last_access': None
                        }

                    stats = self.lock_statistics[stats_key]
                    stats['total_processing_time'] += processing_time
                    stats['process_count'] += 1
                    stats['max_depth'] = max(stats['max_depth'], depth)
                    stats['last_access'] = datetime.now().isoformat()

            def data_processor(self, data_items, process_id):
                """数据处理任务"""
                logger.info(f"Process-{process_id}: 开始处理数据")

                for i, item in enumerate(data_items):
                    self._process_data_item(item, i, process_id)
                    time.sleep(random.uniform(0.05, 0.15))

            def _process_data_item(self, item, index, process_id):
                """处理单个数据项"""
                with self.rlock:  # 第一次获取锁
                    logger.info(f"Process-{process_id}: 处理数据项 {index}")

                    # 数据预处理
                    self._preprocess_data(item, index, process_id)

                    # 数据验证
                    self._validate_data(item, index, process_id)

                    # 数据存储
                    self._store_data(item, index, process_id)

            def _preprocess_data(self, item, index, process_id):
                """数据预处理"""
                with self.rlock:  # 第二次获取锁
                    timestamp = datetime.now().isoformat()

                    preprocessing_info = {
                        'process_id': process_id,
                        'item_index': index,
                        'item_data': item,
                        'preprocessing_time': timestamp,
                        'stage': 'preprocessing'
                    }

                    key = f"preprocess_{process_id}_{index}"
                    self.shared_data[key] = preprocessing_info

                    time.sleep(random.uniform(0.02, 0.05))

            def _validate_data(self, item, index, process_id):
                """数据验证"""
                with self.rlock:  # 第三次获取锁
                    # 模拟验证逻辑
                    is_valid = len(str(item)) > 0

                    validation_info = {
                        'process_id': process_id,
                        'item_index': index,
                        'is_valid': is_valid,
                        'validation_time': datetime.now().isoformat(),
                        'stage': 'validation'
                    }

                    key = f"validate_{process_id}_{index}"
                    self.shared_data[key] = validation_info

                    time.sleep(random.uniform(0.01, 0.03))

            def _store_data(self, item, index, process_id):
                """数据存储"""
                with self.rlock:  # 第四次获取锁
                    storage_info = {
                        'process_id': process_id,
                        'item_index': index,
                        'stored_data': item,
                        'storage_time': datetime.now().isoformat(),
                        'stage': 'storage'
                    }

                    key = f"store_{process_id}_{index}"
                    self.shared_data[key] = storage_info

                    time.sleep(random.uniform(0.02, 0.04))

            def get_summary_report(self):
                """生成摘要报告"""
                with self.rlock:
                    report = {
                        'total_operations': len(self.shared_data),
                        'operation_log_count': len(self.operation_log),
                        'statistics_count': len(self.lock_statistics),
                        'timestamp': datetime.now().isoformat()
                    }
                    return report

        def worker_with_recursion(advanced_system, process_id):
            """包含递归操作的工作进程"""
            # 处理一些数据项
            data_items = [f"data_{i}_{process_id}" for i in range(3)]
            advanced_system.data_processor(data_items, process_id)

            # 执行递归文件处理
            file_paths = [f"file_{process_id}_{i}" for i in range(1, 3)]
            for file_path in file_paths:
                advanced_system.recursive_file_processor(file_path)

        def run_advanced_example():
            """运行高级 RLock 示例"""
            advanced_system = AdvancedRLockUsage()

            # 创建并启动进程
            processes = []
            for i in range(1, 4):
                p = multiprocessing.Process(
                    target=worker_with_recursion,
                    args=(advanced_system, i)
                )
                processes.append(p)
                p.start()

            # 等待所有进程完成
            for p in processes:
                p.join()

            # 生成并输出报告
            report = advanced_system.get_summary_report()
            logger.info("=== 摘要报告 ===")
            for key, value in report.items():
                logger.info(f"{key}: {value}")

            return advanced_system, report

        if __name__ == "__main__":
            system, report = run_advanced_example()
        ---

04.实际应用场景
    a.递归算法保护
        a.场景描述
            在多进程环境中实现递归算法时的同步保护。
        b.解决方案
            使用 RLock 允许递归调用中重复获取锁。
        c.代码示例
            ---
            class RecursiveAlgorithm:
                def __init__(self):
                    self.rlock = multiprocessing.RLock()
                    self.results = multiprocessing.Manager().list()
                    self.call_stack = multiprocessing.Manager().list()

                def fibonacci_with_lock(self, n, process_id):
                    """使用锁保护的斐波那契数列计算"""
                    with self.rlock:
                        self.call_stack.append(f"Process-{process_id}: fib({n})")

                        if n <= 1:
                            result = n
                        else:
                            # 递归调用,会再次获取锁
                            result1 = self.fibonacci_with_lock(n-1, process_id)
                            result2 = self.fibonacci_with_lock(n-2, process_id)
                            result = result1 + result2

                        self.results.append({
                            'process_id': process_id,
                            'input': n,
                            'output': result,
                            'timestamp': datetime.now().isoformat()
                        })

                        self.call_stack.pop()
                        return result
            ---

    b.嵌套资源访问
        a.场景描述
            需要访问多个相互依赖的共享资源。
        b.解决方案
            使用 RLock 简化嵌套资源访问的锁管理。
        c.代码示例
            ---
            class NestedResourceAccess:
                def __init__(self):
                    self.rlock = multiprocessing.RLock()
                    self.resource_a = multiprocessing.Manager().dict()
                    self.resource_b = multiprocessing.Manager().dict()
                    self.access_log = multiprocessing.Manager().list()

                def update_resources(self, key, value_a, value_b, process_id):
                    """更新相互关联的资源"""
                    with self.rlock:  # 第一次获取锁
                        # 更新资源 A
                        self._update_resource_a(key, value_a, process_id)

                        # 更新资源 B(可能需要资源 A 的信息)
                        self._update_resource_b(key, value_b, process_id)

                def _update_resource_a(self, key, value, process_id):
                    """更新资源 A"""
                    with self.rlock:  # 第二次获取锁
                        timestamp = datetime.now().isoformat()
                        self.resource_a[key] = {
                            'value': value,
                            'process_id': process_id,
                            'timestamp': timestamp
                        }
                        logger.info(f"Process-{process_id}: 更新资源 A[{key}] = {value}")

                def _update_resource_b(self, key, value, process_id):
                    """更新资源 B"""
                    with self.rlock:  # 第三次获取锁
                        timestamp = datetime.now().isoformat()

                        # 可能需要读取资源 A 的信息
                        resource_a_info = self.resource_a.get(key, {})

                        self.resource_b[key] = {
                            'value': value,
                            'linked_resource_a': resource_a_info.get('value'),
                            'process_id': process_id,
                            'timestamp': timestamp
                        }
                        logger.info(f"Process-{process_id}: 更新资源 B[{key}] = {value}")
            ---

    c.复杂事务处理
        a.场景描述
            涉及多个步骤的复杂事务需要全程锁定。
        b.解决方案
            使用 RLock 在事务的各个阶段保持锁。
        c.代码示例
            ---
            class TransactionProcessor:
                def __init__(self):
                    self.rlock = multiprocessing.RLock()
                    self.accounts = multiprocessing.Manager().dict()
                    self.transactions = multiprocessing.Manager().list()

                def process_transaction(self, from_account, to_account, amount, process_id):
                    """处理转账事务"""
                    transaction_id = f"txn_{process_id}_{datetime.now().timestamp()}"

                    with self.rlock:  # 第一次获取锁
                        # 验证账户
                        if not self._validate_accounts(from_account, to_account, process_id):
                            return False

                        # 检查余额
                        if not self._check_balance(from_account, amount, process_id):
                            return False

                        # 执行转账
                        if self._execute_transfer(from_account, to_account, amount, transaction_id, process_id):
                            # 记录事务
                            self._record_transaction(transaction_id, from_account, to_account, amount, process_id)
                            return True

                        return False

                def _validate_accounts(self, from_acc, to_acc, process_id):
                    """验证账户"""
                    with self.rlock:  # 第二次获取锁
                        if from_acc not in self.accounts:
                            self.accounts[from_acc] = {'balance': 1000, 'owner': f"owner_{from_acc}"}
                        if to_acc not in self.accounts:
                            self.accounts[to_acc] = {'balance': 500, 'owner': f"owner_{to_acc}"}

                        logger.info(f"Process-{process_id}: 账户验证通过")
                        return True

                def _check_balance(self, account, amount, process_id):
                    """检查余额"""
                    with self.rlock:  # 第三次获取锁
                        balance = self.accounts[account]['balance']
                        if balance >= amount:
                            logger.info(f"Process-{process_id}: 余额检查通过 (余额: {balance})")
                            return True
                        else:
                            logger.warning(f"Process-{process_id}: 余额不足 (余额: {balance}, 需要: {amount})")
                            return False

                def _execute_transfer(self, from_acc, to_acc, amount, txn_id, process_id):
                    """执行转账"""
                    with self.rlock:  # 第四次获取锁
                        # 扣除来源账户
                        self.accounts[from_acc]['balance'] -= amount

                        # 增加目标账户
                        self.accounts[to_acc]['balance'] += amount

                        logger.info(f"Process-{process_id}: 转账执行完成 - {txn_id}")
                        return True
            ---

05.最佳实践与性能优化
    a.使用指南
        a.识别递归场景
            明确哪些场景下需要使用可重入锁。
        b.避免过度使用
            不在递归场景下优先使用普通锁。
        c.锁层次管理
            建立清晰的锁获取和释放模式。
    b.性能考虑
        a.计数开销
            RLock 比普通锁有轻微的计数管理开销。
        b.内存使用
            维护获取计数会占用少量额外内存。
        c.优化策略
            在性能敏感的代码中谨慎使用。
    c.调试与监控
        a.死锁预防
            虽然支持重入,但仍需避免循环等待。
        b.性能分析
            监控锁的获取次数和持有时间。
        c.异常处理
            确保在异常情况下正确释放锁。

7.3 进程间同步

01.同步机制概述
    a.进程间同步的必要性
        多进程环境下需要协调进程对共享资源的访问,避免数据竞争和不一致状态。
    b.主要同步原语
        a.Lock
            基本的互斥锁,确保同时只有一个进程访问临界区。
        b.RLock
            可重入锁,允许同进程多次获取。
        c.Semaphore
            信号量,控制同时访问资源的进程数量。
        d.Event
            事件,用于进程间的简单通信和同步。
        e.Condition
            条件变量,复杂的等待和通知机制。
    c.同步机制选择
        根据具体场景选择合适的同步原语,考虑性能、复杂度和功能需求。

02.Event事件同步
    a.基本概念
        Event 是简单的进程间同步机制,一个进程设置事件,其他进程等待事件。
    b.核心方法
        a.set()
            设置事件状态为True,唤醒所有等待进程。
        b.clear()
            清除事件状态为False。
        c.wait(timeout=None)
            等待事件变为True,可设置超时。
        d.is_set()
            检查事件是否已设置。
    c.代码示例
        ---
        # Event 基本使用示例
        import multiprocessing
        import time
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class EventSynchronization:
            def __init__(self):
                self.start_event = multiprocessing.Event()
                self.stop_event = multiprocessing.Event()
                self.process_status = multiprocessing.Manager().dict()
                self.operation_log = multiprocessing.Manager().list()

            def worker_process(self, worker_id, delay_range=(1, 3)):
                """工作进程,等待开始信号"""
                logger.info(f"Worker-{worker_id}: 准备就绪,等待开始信号")
                self.process_status[f"worker_{worker_id}"] = "waiting"

                # 等待开始事件
                self.start_event.wait()
                logger.info(f"Worker-{worker_id}: 收到开始信号,开始工作")
                self.process_status[f"worker_{worker_id}"] = "working"

                start_time = time.time()
                work_time = 0

                # 工作循环,直到收到停止信号
                work_count = 0
                while not self.stop_event.is_set():
                    work_count += 1
                    task_time = time.time()

                    # 模拟工作
                    time.sleep(delay_range[0] + (delay_range[1] - delay_range[0]) * (worker_id % 3) / 2)

                    task_duration = time.time() - task_time
                    work_time += task_duration

                    # 记录操作
                    operation = {
                        'worker_id': worker_id,
                        'work_count': work_count,
                        'task_duration': task_duration,
                        'timestamp': datetime.now().isoformat()
                    }
                    self.operation_log.append(operation)

                    if work_count % 3 == 0:
                        logger.info(f"Worker-{worker_id}: 完成第 {work_count} 个任务")

                total_work_time = time.time() - start_time
                self.process_status[f"worker_{worker_id}"] = "stopped"

                logger.info(f"Worker-{worker_id}: 停止工作,总工作时间: {total_work_time:.2f}s, 完成任务: {work_count}")

                return {
                    'worker_id': worker_id,
                    'total_work_time': total_work_time,
                    'tasks_completed': work_count,
                    'avg_task_time': work_time / work_count if work_count > 0 else 0
                }

            def supervisor_process(self, num_workers, run_duration=10):
                """监督进程,控制工作进程的开始和停止"""
                logger.info("Supervisor: 启动监督进程")

                # 启动工作进程
                workers = []
                for i in range(1, num_workers + 1):
                    worker = multiprocessing.Process(
                        target=self.worker_process,
                        args=(i, (0.5, 1.5))
                    )
                    workers.append(worker)
                    worker.start()
                    time.sleep(0.2)  # 错开启动时间

                logger.info("Supervisor: 所有工作进程已启动")
                time.sleep(2)  # 等待所有进程准备就绪

                # 发送开始信号
                logger.info("Supervisor: 发送开始信号")
                self.start_event.set()
                self.process_status["supervisor"] = "started"

                # 运行指定时间
                time.sleep(run_duration)

                # 发送停止信号
                logger.info("Supervisor: 发送停止信号")
                self.stop_event.set()
                self.process_status["supervisor"] = "stopped"

                # 等待所有工作进程完成
                results = []
                for i, worker in enumerate(workers, 1):
                    worker.join()
                    results.append(f"Worker-{i} 已停止")

                logger.info("Supervisor: 所有工作进程已停止")
                return results

            def coordinator_process(self, coordination_points=3):
                """协调进程,执行多阶段协调"""
                logger.info("Coordinator: 启动协调进程")

                # 等待工作进程开始
                self.start_event.wait()

                for phase in range(1, coordination_points + 1):
                    logger.info(f"Coordinator: 进入协调阶段 {phase}")

                    # 模拟协调工作
                    time.sleep(2)

                    # 检查工作状态
                    active_workers = sum(
                        1 for status in self.process_status.values()
                        if status == "working"
                    )

                    logger.info(f"Coordinator: 阶段 {phase} 完成,活跃工作进程: {active_workers}")

                    coordination_info = {
                        'phase': phase,
                        'active_workers': active_workers,
                        'timestamp': datetime.now().isoformat()
                    }
                    self.operation_log.append(coordination_info)

                logger.info("Coordinator: 协调工作完成")

        def run_event_synchronization():
            """运行事件同步示例"""
            sync_system = EventSynchronization()

            # 创建监督和协调进程
            supervisor = multiprocessing.Process(
                target=sync_system.supervisor_process,
                args=(4, 8)  # 4个工作进程,运行8秒
            )

            coordinator = multiprocessing.Process(
                target=sync_system.coordinator_process,
                args=(3,)  # 3个协调点
            )

            # 启动进程
            supervisor.start()
            time.sleep(0.5)
            coordinator.start()

            # 等待进程完成
            supervisor.join()
            coordinator.join()

            # 输出结果
            logger.info("=== 最终结果 ===")
            logger.info(f"进程状态: {dict(sync_system.process_status)}")
            logger.info(f"操作日志条目: {len(sync_system.operation_log)}")

            return sync_system

        if __name__ == "__main__":
            result = run_event_synchronization()
        ---

03.Semaphore信号量同步
    a.基本概念
        Semaphore 控制同时访问资源的进程数量,内部维护一个计数器。
    b.核心特性
        a.计数器机制
            初始化时设置最大并发数,每次获取减一,释放加一。
        b.阻塞获取
            当计数器为0时,获取操作会阻塞直到有进程释放。
        c.可重入性
            同一进程可以多次获取同一个信号量。
    c.代码示例
        ---
        # Semaphore 基本使用示例
        import multiprocessing
        import time
        import random
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class ResourcePoolManager:
            def __init__(self, max_resources=3):
                self.semaphore = multiprocessing.Semaphore(max_resources)
                self.max_resources = max_resources
                self.resource_usage = multiprocessing.Manager().dict()
                self.access_log = multiprocessing.Manager().list()
                self.total_requests = multiprocessing.Value('i', 0)
                self.total_granted = multiprocessing.Value('i', 0)

            def resource_user(self, user_id, max_requests=5):
                """资源使用者"""
                logger.info(f"User-{user_id}: 开始使用资源")

                for request_id in range(1, max_requests + 1):
                    self.total_requests.value += 1
                    logger.info(f"User-{user_id}: 请求 #{request_id} 资源")

                    start_wait_time = time.time()

                    # 尝试获取资源
                    with self.semaphore:
                        wait_time = time.time() - start_wait_time
                        self.total_granted.value += 1

                        logger.info(f"User-{user_id}: 获得资源 #{request_id} (等待时间: {wait_time:.2f}s)")

                        # 记录资源获取
                        access_info = {
                            'user_id': user_id,
                            'request_id': request_id,
                            'wait_time': wait_time,
                            'timestamp': datetime.now().isoformat(),
                            'status': 'acquired'
                        }
                        self.access_log.append(access_info)

                        # 模拟资源使用
                        usage_time = random.uniform(1, 3)
                        self.resource_usage[f"user_{user_id}_req_{request_id}"] = {
                            'usage_time': usage_time,
                            'start_time': datetime.now().isoformat()
                        }

                        time.sleep(usage_time)

                        # 记录资源释放
                        release_info = {
                            'user_id': user_id,
                            'request_id': request_id,
                            'usage_time': usage_time,
                            'timestamp': datetime.now().isoformat(),
                            'status': 'released'
                        }
                        self.access_log.append(release_info)

                        logger.info(f"User-{user_id}: 释放资源 #{request_id} (使用时间: {usage_time:.2f}s)")

                    # 模拟请求间隔
                    time.sleep(random.uniform(0.5, 1.5))

                logger.info(f"User-{user_id}: 完成所有请求")

            def limited_resource_processor(self, processor_id, processing_time_range=(2, 5)):
                """受限资源处理器"""
                logger.info(f"Processor-{processor_id}: 启动处理器")

                processed_count = 0

                # 持续运行直到被外部停止
                while True:
                    try:
                        with self.semaphore:
                            logger.info(f"Processor-{processor_id}: 获得处理权限")

                            # 模拟复杂处理
                            processing_time = random.uniform(*processing_time_range)
                            start_time = time.time()

                            # 执行处理工作
                            time.sleep(processing_time)

                            processed_count += 1

                            # 记录处理结果
                            processing_info = {
                                'processor_id': processor_id,
                                'processed_count': processed_count,
                                'processing_time': processing_time,
                                'timestamp': datetime.now().isoformat()
                            }
                            self.access_log.append(processing_info)

                            logger.info(f"Processor-{processor_id}: 完成第 {processed_count} 次处理")

                    except KeyboardInterrupt:
                        logger.info(f"Processor-{processor_id}: 收到中断信号,停止处理")
                        break

            def monitor_usage(self, duration=20):
                """监控资源使用情况"""
                logger.info("Monitor: 开始监控资源使用")
                start_time = time.time()

                while time.time() - start_time < duration:
                    time.sleep(2)

                    # 计算当前使用率
                    granted_ratio = (self.total_granted.value / self.total_requests.value * 100
                                  if self.total_requests.value > 0 else 0)

                    monitor_info = {
                        'total_requests': self.total_requests.value,
                        'total_granted': self.total_granted.value,
                        'grant_ratio': granted_ratio,
                        'timestamp': datetime.now().isoformat()
                    }

                    self.access_log.append(monitor_info)
                    logger.info(f"Monitor: 总请求: {self.total_requests.value}, "
                              f"已授权: {self.total_granted.value} ({granted_ratio:.1f}%)")

                logger.info("Monitor: 监控结束")

            def generate_report(self):
                """生成使用报告"""
                with self.semaphore:  # 确保在生成报告时没有资源变化
                    report = {
                        'max_resources': self.max_resources,
                        'total_requests': self.total_requests.value,
                        'total_granted': self.total_granted.value,
                        'success_rate': (self.total_granted.value / self.total_requests.value * 100
                                       if self.total_requests.value > 0 else 0),
                        'unique_users': len(set(item['user_id'] for item in self.access_log
                                              if 'user_id' in item)),
                        'total_operations': len(self.access_log),
                        'report_time': datetime.now().isoformat()
                    }
                    return report

        def run_semaphore_example():
            """运行信号量示例"""
            resource_manager = ResourcePoolManager(max_resources=2)  # 最多2个并发资源

            # 创建用户进程
            users = []
            for i in range(1, 6):  # 5个用户
                user = multiprocessing.Process(
                    target=resource_manager.resource_user,
                    args=(i, 4)  # 每个用户4个请求
                )
                users.append(user)
                user.start()

            # 创建监控进程
            monitor = multiprocessing.Process(
                target=resource_manager.monitor_usage,
                args=(15,)  # 监控15秒
            )
            monitor.start()

            # 等待所有用户完成
            for user in users:
                user.join()

            # 生成报告
            report = resource_manager.generate_report()
            logger.info("=== 资源使用报告 ===")
            for key, value in report.items():
                logger.info(f"{key}: {value}")

            return resource_manager, report

        if __name__ == "__main__":
            manager, report = run_semaphore_example()
        ---

04.Condition条件变量同步
    a.基本概念
        Condition 允许进程等待特定条件成立,并通过通知机制唤醒等待的进程。
    b.核心方法
        a.acquire/release
            获取和释放底层锁。
        b.wait(timeout=None)
            等待条件通知,会释放锁并在被通知时重新获取锁。
        c.notify(n=1)
            唤醒最多n个等待的进程。
        d.notify_all()
            唤醒所有等待的进程。
    c.代码示例
        ---
        # Condition 基本使用示例
        import multiprocessing
        import time
        import random
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class ProducerConsumerSystem:
            def __init__(self, buffer_size=5):
                self.buffer_size = buffer_size
                self.buffer = multiprocessing.Manager().list()
                self.condition = multiprocessing.Condition()
                self.produced_count = multiprocessing.Value('i', 0)
                self.consumed_count = multiprocessing.Value('i', 0)
                self.production_log = multiprocessing.Manager().list()

            def producer(self, producer_id, items_to_produce=10):
                """生产者进程"""
                logger.info(f"Producer-{producer_id}: 开始生产")

                for item_id in range(1, items_to_produce + 1):
                    item = f"P{producer_id}-Item{item_id}"

                    with self.condition:
                        # 等待缓冲区有空间
                        while len(self.buffer) >= self.buffer_size:
                            logger.info(f"Producer-{producer_id}: 缓冲区已满,等待消费")
                            self.condition.wait()

                        # 生产项目
                        self.buffer.append(item)
                        self.produced_count.value += 1

                        # 记录生产信息
                        production_info = {
                            'producer_id': producer_id,
                            'item_id': item_id,
                            'item': item,
                            'buffer_size': len(self.buffer),
                            'timestamp': datetime.now().isoformat()
                        }
                        self.production_log.append(production_info)

                        logger.info(f"Producer-{producer_id}: 生产 {item} "
                                  f"(缓冲区: {len(self.buffer)}/{self.buffer_size})")

                        # 通知消费者
                        self.condition.notify()

                    # 模拟生产时间
                    time.sleep(random.uniform(0.1, 0.5))

                logger.info(f"Producer-{producer_id}: 完成生产任务")

            def consumer(self, consumer_id, items_to_consume=8):
                """消费者进程"""
                logger.info(f"Consumer-{consumer_id}: 开始消费")

                consumed_items = 0

                while consumed_items < items_to_consume:
                    with self.condition:
                        # 等待缓冲区有项目
                        while len(self.buffer) == 0:
                            logger.info(f"Consumer-{consumer_id}: 缓冲区为空,等待生产")
                            self.condition.wait()

                        # 消费项目
                        item = self.buffer.pop(0)
                        self.consumed_count.value += 1
                        consumed_items += 1

                        # 记录消费信息
                        consumption_info = {
                            'consumer_id': consumer_id,
                            'item': item,
                            'consumed_count': consumed_items,
                            'buffer_size': len(self.buffer),
                            'timestamp': datetime.now().isoformat()
                        }
                        self.production_log.append(consumption_info)

                        logger.info(f"Consumer-{consumer_id}: 消费 {item} "
                                  f"(缓冲区: {len(self.buffer)}/{self.buffer_size})")

                        # 通知生产者
                        self.condition.notify()

                    # 模拟消费时间
                    time.sleep(random.uniform(0.2, 0.6))

                logger.info(f"Consumer-{consumer_id}: 完成消费任务,共消费 {consumed_items} 个项目")

            def priority_consumer(self, priority_level, items_to_consume=5):
                """优先消费者,根据优先级选择消费"""
                logger.info(f"Priority-Consumer-{priority_level}: 开始消费 (优先级: {priority_level})")

                consumed = 0
                attempts = 0

                while consumed < items_to_consume and attempts < items_to_consume * 3:
                    with self.condition:
                        # 检查是否有合适的项目(这里简化为任选项目)
                        if self.buffer:
                            # 根据优先级决定消费策略
                            if priority_level == 1:  # 高优先级,立即消费
                                item = self.buffer.pop(0)
                            elif priority_level == 2:  # 中优先级,等待一个项目
                                if len(self.buffer) > 1:
                                    item = self.buffer.pop(0)
                                else:
                                    attempts += 1
                                    self.condition.wait(timeout=1)
                                    continue
                            else:  # 低优先级,等待更多项目
                                if len(self.buffer) > 2:
                                    item = self.buffer.pop(0)
                                else:
                                    attempts += 1
                                    self.condition.wait(timeout=1)
                                    continue

                            consumed += 1
                            self.consumed_count.value += 1

                            # 记录优先消费信息
                            priority_info = {
                                'priority_level': priority_level,
                                'item': item,
                                'consumed': consumed,
                                'attempts': attempts,
                                'buffer_size': len(self.buffer),
                                'timestamp': datetime.now().isoformat()
                            }
                            self.production_log.append(priority_info)

                            logger.info(f"Priority-{priority_level}: 消费 {item} "
                                      f"(尝试次数: {attempts}, 缓冲区: {len(self.buffer)})")

                            # 通知其他进程
                            self.condition.notify_all()
                        else:
                            attempts += 1
                            self.condition.wait(timeout=1)

                    time.sleep(random.uniform(0.1, 0.3))

                logger.info(f"Priority-Consumer-{priority_level}: 完成,消费 {consumed} 个项目,尝试 {attempts} 次")

            def monitor_system(self, duration=15):
                """系统监控器"""
                logger.info("Monitor: 开始系统监控")
                start_time = time.time()

                while time.time() - start_time < duration:
                    time.sleep(1)

                    with self.condition:
                        monitor_info = {
                            'buffer_size': len(self.buffer),
                            'produced': self.produced_count.value,
                            'consumed': self.consumed_count.value,
                            'timestamp': datetime.now().isoformat()
                        }
                        self.production_log.append(monitor_info)

                        logger.info(f"Monitor: 缓冲区 {len(self.buffer)}/{self.buffer_size}, "
                                  f"已生产: {self.produced_count.value}, "
                                  f"已消费: {self.consumed_count.value}")

            def generate_final_report(self):
                """生成最终报告"""
                report = {
                    'buffer_size': self.buffer_size,
                    'total_produced': self.produced_count.value,
                    'total_consumed': self.consumed_count.value,
                    'remaining_items': len(self.buffer),
                    'throughput': self.consumed_count.value / 15 if self.consumed_count.value > 0 else 0,
                    'total_operations': len(self.production_log),
                    'report_time': datetime.now().isoformat()
                }
                return report

        def run_producer_consumer_example():
            """运行生产者-消费者示例"""
            system = ProducerConsumerSystem(buffer_size=3)

            # 创建生产者
            producers = []
            for i in range(1, 3):  # 2个生产者
                producer = multiprocessing.Process(
                    target=system.producer,
                    args=(i, 6)  # 每个生产者生产6个项目
                )
                producers.append(producer)
                producer.start()

            time.sleep(1)  # 让生产者先开始

            # 创建消费者
            consumers = []
            for i in range(1, 4):  # 3个消费者
                consumer = multiprocessing.Process(
                    target=system.consumer,
                    args=(i, 4)  # 每个消费者消费4个项目
                )
                consumers.append(consumer)
                consumer.start()

            # 创建优先消费者
            priority_consumers = []
            for priority in range(1, 4):  # 3个优先级
                pc = multiprocessing.Process(
                    target=system.priority_consumer,
                    args=(priority, 3)
                )
                priority_consumers.append(pc)
                pc.start()

            # 启动监控器
            monitor = multiprocessing.Process(
                target=system.monitor_system,
                args=(12,)
            )
            monitor.start()

            # 等待所有进程完成
            for producer in producers:
                producer.join()
            for consumer in consumers:
                consumer.join()
            for pc in priority_consumers:
                pc.join()
            monitor.join()

            # 生成报告
            report = system.generate_final_report()
            logger.info("=== 生产者-消费者系统报告 ===")
            for key, value in report.items():
                logger.info(f"{key}: {value}")

            return system, report

        if __name__ == "__main__":
            system, report = run_producer_consumer_example()
        ---

05.同步机制最佳实践
    a.选择合适的同步原语
        a.简单互斥使用 Lock
            适用于基本的临界区保护。
        b.可重入场景使用 RLock
            适用于递归和嵌套调用。
        c.计数控制使用 Semaphore
            适用于限制并发访问数量。
        d.简单通知使用 Event
            适用于进程间的简单信号通信。
        e.复杂条件使用 Condition
            适用于需要等待特定条件的复杂同步。
    b.性能优化策略
        a.减少锁的粒度
            尽量缩小临界区的范围,减少锁的持有时间。
        b.避免嵌套锁
            谨慎使用多个锁,避免死锁风险。
        c.批量操作
            将多个小操作合并为一次锁保护的大操作。
        d.异步处理
            在可能的情况下使用异步模式减少同步开销。
    c.错误处理与调试
        a.超时机制
            为等待操作设置合理的超时时间。
        b.异常安全
            确保在异常情况下正确释放同步原语。
        c.日志监控
            记录同步操作的关键信息用于调试。
        d.状态检查
            定期检查同步对象的状态,确保系统健康运行。

7.4 共享内存保护

01.共享内存概述
    a.共享内存的概念
        多进程间共享同一内存区域,实现高效的数据交换和通信。
    b.主要优势
        a.高性能
            避免了进程间数据复制的开销。
        b.实时性
            数据修改立即对所有进程可见。
        c.大容量
            支持大量数据的共享存储。
    c.挑战与风险
        a.数据竞争
            多进程同时访问可能导致数据不一致。
        b.内存安全
            需要确保内存访问的原子性和一致性。
        c.同步复杂性
            需要精心设计同步机制。

02.multiprocessing.shared_memory
    a.基本概念
        Python 3.8+ 提供的跨进程共享内存机制,支持直接内存访问。
    b.核心类和方法
        a.SharedMemory
            创建和管理共享内存块。
        b.attach()
            连接到已存在的共享内存。
        c.unlink()
            释放共享内存资源。
    c.代码示例
        ---
        # SharedMemory 基本使用示例
        import multiprocessing
        import time
        import logging
        from datetime import datetime
        import numpy as np
        from multiprocessing import shared_memory

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class SharedMemoryManager:
            def __init__(self, size_bytes=1024*1024):  # 1MB
                self.size_bytes = size_bytes
                self.shared_block = None
                self.access_log = multiprocessing.Manager().list()
                self.operation_count = multiprocessing.Value('i', 0)

            def create_shared_memory(self, name="my_shared_memory"):
                """创建共享内存块"""
                try:
                    # 如果已存在同名内存块,先清理
                    try:
                        existing_shm = shared_memory.SharedMemory(name=name)
                        existing_shm.close()
                        existing_shm.unlink()
                    except FileNotFoundError:
                        pass

                    self.shared_block = shared_memory.SharedMemory(
                        create=True,
                        size=self.size_bytes,
                        name=name
                    )
                    logger.info(f"创建共享内存块: {name}, 大小: {self.size_bytes} bytes")
                    return self.shared_block
                except Exception as e:
                    logger.error(f"创建共享内存失败: {e}")
                    return None

            def cleanup_shared_memory(self):
                """清理共享内存资源"""
                if self.shared_block:
                    try:
                        self.shared_block.close()
                        self.shared_block.unlink()
                        logger.info("共享内存资源已清理")
                    except Exception as e:
                        logger.error(f"清理共享内存失败: {e}")

        class DataWriter:
            def __init__(self, shm_name, size_bytes):
                self.shm_name = shm_name
                self.size_bytes = size_bytes
                self.process_id = multiprocessing.current_process().pid
                self.write_count = 0

            def write_data(self, data_pattern="sequential", batch_size=10, write_interval=0.1):
                """写入数据到共享内存"""
                logger.info(f"Writer-{self.process_id}: 开始写入数据,模式: {data_pattern}")

                # 连接到共享内存
                shm = shared_memory.SharedMemory(name=self.shm_name)

                try:
                    for batch in range(batch_size):
                        # 准备数据
                        if data_pattern == "sequential":
                            data = f"Data-{self.process_id}-Batch{batch}".encode('utf-8')
                        elif data_pattern == "random":
                            import random
                            data = f"Rand-{self.process_id}-{random.randint(1000, 9999)}".encode('utf-8')
                        else:  # timestamp
                            data = f"Time-{self.process_id}-{datetime.now().isoformat()}".encode('utf-8')

                        # 计算写入位置(简单轮询策略)
                        position = (batch * 64) % (self.size_bytes - len(data))

                        # 写入数据
                        data_bytes = data + b'\x00' * (64 - len(data))  # 固定64字节
                        shm.buf[position:position+64] = data_bytes

                        self.write_count += 1

                        # 记录操作
                        operation_info = {
                            'process_id': self.process_id,
                            'operation': 'write',
                            'batch': batch,
                            'position': position,
                            'data_length': len(data),
                            'timestamp': datetime.now().isoformat()
                        }

                        logger.info(f"Writer-{self.process_id}: 写入批次 {batch} 到位置 {position}")
                        time.sleep(write_interval)

                finally:
                    shm.close()

                logger.info(f"Writer-{self.process_id}: 完成 {self.write_count} 次写入操作")

        class DataReader:
            def __init__(self, shm_name, size_bytes):
                self.shm_name = shm_name
                self.size_bytes = size_bytes
                self.process_id = multiprocessing.current_process().pid
                self.read_count = 0

            def read_data(self, read_duration=10, read_interval=0.5):
                """从共享内存读取数据"""
                logger.info(f"Reader-{self.process_id}: 开始读取数据,持续: {read_duration}s")

                # 连接到共享内存
                shm = shared_memory.SharedMemory(name=self.shm_name)

                start_time = time.time()

                try:
                    while time.time() - start_time < read_duration:
                        # 随机选择读取位置
                        import random
                        position = random.randint(0, max(0, self.size_bytes - 64))

                        # 读取64字节数据
                        data_bytes = bytes(shm.buf[position:position+64])

                        # 尝试解码数据(跳过空数据)
                        try:
                            data_str = data_bytes.rstrip(b'\x00').decode('utf-8')
                            if data_str.strip():
                                self.read_count += 1

                                # 记录读取信息
                                operation_info = {
                                    'process_id': self.process_id,
                                    'operation': 'read',
                                    'position': position,
                                    'data': data_str,
                                    'timestamp': datetime.now().isoformat()
                                }

                                logger.info(f"Reader-{self.process_id}: 从位置 {position} 读取: {data_str}")

                        except UnicodeDecodeError:
                            # 忽略无法解码的数据
                            pass

                        time.sleep(read_interval)

                finally:
                    shm.close()

                logger.info(f"Reader-{self.process_id}: 完成 {self.read_count} 次读取操作")

        def run_shared_memory_example():
            """运行共享内存示例"""
            # 创建共享内存管理器
            shm_manager = SharedMemoryManager(size_bytes=1024*512)  # 512KB

            # 创建共享内存
            shared_block = shm_manager.create_shared_memory("example_shm")

            if not shared_block:
                logger.error("无法创建共享内存,退出")
                return

            try:
                # 创建写入进程
                writers = []
                writer_configs = [
                    ("sequential", 5, 0.2),
                    ("random", 3, 0.3),
                    ("timestamp", 4, 0.15)
                ]

                for i, (pattern, batches, interval) in enumerate(writer_configs, 1):
                    writer = multiprocessing.Process(
                        target=DataWriter("example_shm", 1024*512).write_data,
                        args=(pattern, batches, interval)
                    )
                    writers.append(writer)
                    writer.start()

                time.sleep(1)  # 让写入者先开始

                # 创建读取进程
                readers = []
                for i in range(1, 4):
                    reader = multiprocessing.Process(
                        target=DataReader("example_shm", 1024*512).read_data,
                        args=(8, 0.4)  # 读取8秒,间隔0.4秒
                    )
                    readers.append(reader)
                    reader.start()

                # 等待所有进程完成
                for writer in writers:
                    writer.join()
                for reader in readers:
                    reader.join()

                logger.info("共享内存示例完成")

            finally:
                # 清理共享内存
                shm_manager.cleanup_shared_memory()

        if __name__ == "__main__":
            run_shared_memory_example()
        ---

03.Value和Array共享对象
    a.multiprocessing.Value
        用于在进程间共享基本数据类型(整数、浮点数等)。
    b.multiprocessing.Array
        用于在进程间共享数组数据。
    c.同步保护机制
        通过锁机制确保对共享对象的原子访问。
    d.代码示例
        ---
        # Value 和 Array 共享对象示例
        import multiprocessing
        import time
        import random
        import logging
        from datetime import datetime
        from multiprocessing import Value, Array, Lock

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class SharedObjectManager:
            def __init__(self):
                # 创建共享变量
                self.counter = Value('i', 0)  # 整数计数器
                self.total_sum = Value('d', 0.0)  # 双精度浮点数
                self.status = Value('b', 0)  # 字节类型状态
                self.flag = Value('?', False)  # 布尔标志

                # 创建共享数组
                self.results = Array('i', 100)  # 整数数组,100个元素
                self.measurements = Array('d', 50)  # 浮点数数组,50个元素

                # 创建同步锁
                self.counter_lock = Lock()
                self.array_lock = Lock()

                # 共享字典(通过Manager)
                self.manager = multiprocessing.Manager()
                self.operation_log = self.manager.dict()
                self.process_stats = self.manager.dict()

            def increment_counter(self, process_id, increment_value=1):
                """原子递增计数器"""
                with self.counter_lock:
                    old_value = self.counter.value
                    self.counter.value += increment_value
                    new_value = self.counter.value

                    # 记录操作
                    self.operation_log[f"{process_id}_{datetime.now().timestamp()}"] = {
                        'process_id': process_id,
                        'operation': 'increment',
                        'old_value': old_value,
                        'increment_value': increment_value,
                        'new_value': new_value,
                        'timestamp': datetime.now().isoformat()
                    }

                    return new_value - old_value

            def update_measurement(self, index, value, process_id):
                """更新测量数组"""
                with self.array_lock:
                    if 0 <= index < len(self.measurements):
                        old_value = self.measurements[index]
                        self.measurements[index] = value

                        # 更新总和
                        with self.counter_lock:
                            self.total_sum.value = self.total_sum.value - old_value + value

                        self.operation_log[f"array_{datetime.now().timestamp()}"] = {
                            'process_id': process_id,
                            'operation': 'update_array',
                            'index': index,
                            'old_value': old_value,
                            'new_value': value,
                            'timestamp': datetime.now().isoformat()
                        }

                        return True
                    return False

            def add_result(self, process_id, result_value):
                """添加结果到数组"""
                with self.array_lock:
                    # 找到第一个空位置(值为0)
                    for i in range(len(self.results)):
                        if self.results[i] == 0:
                            self.results[i] = result_value

                            self.operation_log[f"result_{datetime.now().timestamp()}"] = {
                                'process_id': process_id,
                                'operation': 'add_result',
                                'position': i,
                                'value': result_value,
                                'timestamp': datetime.now().isoformat()
                            }

                            return i
                    return -1  # 数组已满

        class CalculationWorker:
            def __init__(self, worker_id, shared_manager):
                self.worker_id = worker_id
                self.shared_manager = shared_manager

            def perform_calculations(self, num_calculations=20):
                """执行计算任务"""
                logger.info(f"Worker-{self.worker_id}: 开始计算任务")

                calculation_results = []

                for i in range(num_calculations):
                    # 生成随机计算
                    a = random.randint(1, 100)
                    b = random.randint(1, 100)
                    operation = random.choice(['+', '-', '*'])

                    if operation == '+':
                        result = a + b
                    elif operation == '-':
                        result = a - b
                    else:  # '*'
                        result = a * b

                    # 更新计数器
                    increment = self.shared_manager.increment_counter(self.worker_id, 1)

                    # 添加计算结果
                    position = self.shared_manager.add_result(self.worker_id, result)

                    # 更新测量数据
                    measurement_index = i % len(self.shared_manager.measurements)
                    self.shared_manager.update_measurement(
                        measurement_index,
                        result / 100.0,  # 转换为小数
                        self.worker_id
                    )

                    calculation_results.append({
                        'a': a,
                        'b': b,
                        'operation': operation,
                        'result': result,
                        'increment': increment,
                        'position': position
                    })

                    # 记录统计信息
                    if f"worker_{self.worker_id}" not in self.shared_manager.process_stats:
                        self.shared_manager.process_stats[f"worker_{self.worker_id}"] = {
                            'calculations': 0,
                            'total_result': 0,
                            'start_time': datetime.now().isoformat()
                        }

                    self.shared_manager.process_stats[f"worker_{self.worker_id}"]['calculations'] += 1
                    self.shared_manager.process_stats[f"worker_{self.worker_id}"]['total_result'] += result

                    if i % 5 == 0:
                        logger.info(f"Worker-{self.worker_id}: 完成第 {i+1} 个计算: {a} {operation} {b} = {result}")

                    time.sleep(random.uniform(0.1, 0.3))

                logger.info(f"Worker-{self.worker_id}: 完成 {num_calculations} 个计算")
                return calculation_results

        class MonitoringWorker:
            def __init__(self, shared_manager):
                self.shared_manager = shared_manager

            def monitor_system(self, duration=15, check_interval=2):
                """监控系统状态"""
                logger.info(f"Monitor: 开始监控,持续时间: {duration}s")

                start_time = time.time()

                while time.time() - start_time < duration:
                    # 读取共享数据
                    current_counter = self.shared_manager.counter.value
                    current_sum = self.shared_manager.total_sum.value
                    current_status = self.shared_manager.status.value
                    current_flag = self.shared_manager.flag.value

                    # 统计数组信息
                    non_zero_results = sum(1 for x in self.shared_manager.results if x != 0)
                    non_zero_measurements = sum(1 for x in self.shared_manager.measurements if x != 0)

                    # 创建监控报告
                    monitor_info = {
                        'timestamp': datetime.now().isoformat(),
                        'counter': current_counter,
                        'total_sum': current_sum,
                        'status': current_status,
                        'flag': current_flag,
                        'non_zero_results': non_zero_results,
                        'non_zero_measurements': non_zero_measurements,
                        'operation_log_size': len(self.shared_manager.operation_log),
                        'process_count': len(self.shared_manager.process_stats)
                    }

                    logger.info(f"Monitor: 计数器={current_counter}, 总和={current_sum:.2f}, "
                              f"结果数组使用={non_zero_results}/{len(self.shared_manager.results)}, "
                              f"测量数组使用={non_zero_measurements}/{len(self.shared_manager.measurements)}")

                    time.sleep(check_interval)

                logger.info("Monitor: 监控结束")

        class DataAnalyzer:
            def __init__(self, shared_manager):
                self.shared_manager = shared_manager

            def analyze_results(self):
                """分析计算结果"""
                logger.info("Analyzer: 开始分析结果")

                with self.shared_manager.array_lock:
                    # 分析结果数组
                    results_list = list(self.shared_manager.results)
                    valid_results = [x for x in results_list if x != 0]

                if valid_results:
                    analysis = {
                        'total_valid_results': len(valid_results),
                        'max_result': max(valid_results),
                        'min_result': min(valid_results),
                        'avg_result': sum(valid_results) / len(valid_results),
                        'sum_results': sum(valid_results)
                    }

                    # 分析测量数据
                    measurements_list = list(self.shared_manager.measurements)
                    valid_measurements = [x for x in measurements_list if x != 0]

                    if valid_measurements:
                        analysis['measurements_stats'] = {
                            'count': len(valid_measurements),
                            'max': max(valid_measurements),
                            'min': min(valid_measurements),
                            'avg': sum(valid_measurements) / len(valid_measurements),
                            'sum': sum(valid_measurements)
                        }

                    logger.info(f"Analyzer: 分析完成 - 有效结果: {len(valid_results)}, "
                              f"最大值: {analysis['max_result']}, "
                              f"平均值: {analysis['avg_result']:.2f}")

                    return analysis
                else:
                    logger.warning("Analyzer: 没有有效结果可分析")
                    return {}

        def run_shared_objects_example():
            """运行共享对象示例"""
            # 创建共享对象管理器
            shared_manager = SharedObjectManager()

            # 创建计算工作进程
            workers = []
            for i in range(1, 4):
                worker_process = multiprocessing.Process(
                    target=CalculationWorker(i, shared_manager).perform_calculations,
                    args=(15,)  # 每个工作进程15个计算
                )
                workers.append(worker_process)
                worker_process.start()

            time.sleep(0.5)  # 让计算进程先开始

            # 创建监控进程
            monitor_process = multiprocessing.Process(
                target=MonitoringWorker(shared_manager).monitor_system,
                args=(10, 1.5)  # 监控10秒,每1.5秒检查一次
            )
            monitor_process.start()

            # 等待所有工作进程完成
            for worker in workers:
                worker.join()

            # 等待监控进程完成
            monitor_process.join()

            # 执行最终分析
            analyzer = DataAnalyzer(shared_manager)
            analysis = analyzer.analyze_results()

            # 输出最终统计
            logger.info("=== 最终统计 ===")
            logger.info(f"总计数器值: {shared_manager.counter.value}")
            logger.info(f"总总和值: {shared_manager.total_sum.value}")
            logger.info(f"操作日志条目: {len(shared_manager.operation_log)}")

            if analysis:
                for key, value in analysis.items():
                    if isinstance(value, dict):
                        logger.info(f"{key}: {value}")
                    else:
                        logger.info(f"{key}: {value}")

            return shared_manager, analysis

        if __name__ == "__main__":
            manager, result = run_shared_objects_example()
        ---

04.高级同步策略
    a.分层锁定策略
        对不同层次的数据使用不同的锁,减少锁竞争。
    b.读写锁模式
        对读多写少的场景使用读写分离策略。
    c.批量操作模式
        将多个小操作合并为一次锁保护的大操作。
    d.代码示例
        ---
        # 高级同步策略示例
        import multiprocessing
        import time
        import random
        import logging
        from datetime import datetime
        from multiprocessing import Value, Array, Lock, RLock

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class AdvancedSyncManager:
            def __init__(self, num_resources=10):
                # 基础共享数据
                self.global_counter = Value('i', 0)
                self.resource_states = Array('i', [0] * num_resources)  # 0: 空闲, 1: 使用中, 2: 维护中
                self.resource_data = Array('d', [0.0] * num_resources)
                self.access_times = Array('d', [0.0] * num_resources)

                # 分层锁结构
                self.global_lock = Lock()  # 全局操作锁
                self.resource_locks = [Lock() for _ in range(num_resources)]  # 每个资源的专用锁
                self.statistics_lock = Lock()  # 统计数据锁

                # 共享管理对象
                self.manager = multiprocessing.Manager()
                self.operation_history = self.manager.list()
                self.resource_statistics = self.manager.dict()
                self.performance_metrics = self.manager.dict()

            def acquire_resource(self, resource_id, process_id, operation_type="read"):
                """获取资源访问权限"""
                if 0 <= resource_id < len(self.resource_states):
                    # 使用资源专用锁
                    with self.resource_locks[resource_id]:
                        current_state = self.resource_states[resource_id]

                        # 检查资源状态
                        if current_state == 0:  # 空闲
                            self.resource_states[resource_id] = 1 if operation_type == "write" else 1
                            self.access_times[resource_id] = time.time()

                            # 记录操作
                            self._record_operation(process_id, "acquire", resource_id, operation_type)
                            return True
                        elif current_state == 1 and operation_type == "read":  # 使用中,但允许读
                            self.access_times[resource_id] = time.time()
                            self._record_operation(process_id, "acquire_shared", resource_id, operation_type)
                            return True
                        else:
                            self._record_operation(process_id, "acquire_failed", resource_id, operation_type)
                            return False
                return False

            def release_resource(self, resource_id, process_id, new_value=None):
                """释放资源访问权限"""
                if 0 <= resource_id < len(self.resource_states):
                    with self.resource_locks[resource_id]:
                        if new_value is not None:
                            self.resource_data[resource_id] = new_value

                        self.resource_states[resource_id] = 0  # 设为空闲
                        self._record_operation(process_id, "release", resource_id, "update" if new_value else "read")
                        return True
                return False

            def batch_update_resources(self, updates, process_id):
                """批量更新多个资源"""
                # 按资源ID排序,避免死锁
                sorted_updates = sorted(updates, key=lambda x: x[0])

                # 获取所有需要的锁(按顺序)
                acquired_locks = []
                try:
                    for resource_id, _ in sorted_updates:
                        if 0 <= resource_id < len(self.resource_locks):
                            self.resource_locks[resource_id].acquire()
                            acquired_locks.append(resource_id)

                    # 执行批量更新
                    for resource_id, value in sorted_updates:
                        if new_value is not None:
                            self.resource_data[resource_id] = value
                            self.access_times[resource_id] = time.time()

                    self._record_operation(process_id, "batch_update", len(sorted_updates), "write")
                    return True

                finally:
                    # 释放所有锁
                    for resource_id in reversed(acquired_locks):
                        self.resource_locks[resource_id].release()

            def _record_operation(self, process_id, operation, resource_id, operation_type):
                """记录操作历史"""
                with self.statistics_lock:
                    operation_record = {
                        'process_id': process_id,
                        'operation': operation,
                        'resource_id': resource_id,
                        'operation_type': operation_type,
                        'timestamp': datetime.now().isoformat(),
                        'global_counter': self.global_counter.value
                    }

                    self.operation_history.append(operation_record)

                    # 更新统计信息
                    if f"process_{process_id}" not in self.resource_statistics:
                        self.resource_statistics[f"process_{process_id}"] = {
                            'total_operations': 0,
                            'acquires': 0,
                            'releases': 0,
                            'failed_acquires': 0,
                            'batch_operations': 0
                        }

                    stats = self.resource_statistics[f"process_{process_id}"]
                    stats['total_operations'] += 1

                    if operation == 'acquire' or operation == 'acquire_shared':
                        stats['acquires'] += 1
                    elif operation == 'release':
                        stats['releases'] += 1
                    elif operation == 'acquire_failed':
                        stats['failed_acquires'] += 1
                    elif operation == 'batch_update':
                        stats['batch_operations'] += 1

            def get_resource_snapshot(self):
                """获取资源状态快照"""
                snapshot = {
                    'timestamp': datetime.now().isoformat(),
                    'global_counter': self.global_counter.value,
                    'resource_states': list(self.resource_states),
                    'resource_data': list(self.resource_data),
                    'access_times': list(self.access_times)
                }
                return snapshot

        class ResourceWorker:
            def __init__(self, worker_id, sync_manager):
                self.worker_id = worker_id
                self.sync_manager = sync_manager

            def perform_operations(self, num_operations=20):
                """执行资源操作"""
                logger.info(f"Worker-{self.worker_id}: 开始执行操作")

                successful_operations = 0
                failed_operations = 0

                for i in range(num_operations):
                    operation_type = random.choice(['read', 'write'])
                    resource_id = random.randint(0, len(self.sync_manager.resource_states) - 1)

                    # 尝试获取资源
                    if self.sync_manager.acquire_resource(resource_id, self.worker_id, operation_type):
                        try:
                            if operation_type == 'write':
                                # 模拟写入操作
                                new_value = random.uniform(0, 1000)
                                time.sleep(random.uniform(0.1, 0.3))
                                self.sync_manager.release_resource(resource_id, self.worker_id, new_value)
                            else:
                                # 模拟读取操作
                                current_value = self.sync_manager.resource_data[resource_id]
                                time.sleep(random.uniform(0.05, 0.15))
                                self.sync_manager.release_resource(resource_id, self.worker_id)

                            successful_operations += 1

                        except Exception as e:
                            logger.error(f"Worker-{self.worker_id}: 操作异常 - {e}")
                            failed_operations += 1
                    else:
                        failed_operations += 1
                        # 资源被占用,短暂等待后重试
                        time.sleep(random.uniform(0.1, 0.2))

                    # 偶尔执行批量操作
                    if i % 7 == 0 and i > 0:
                        batch_updates = []
                        for j in range(3):
                            batch_resource_id = (resource_id + j) % len(self.sync_manager.resource_states)
                            batch_value = random.uniform(0, 100)
                            batch_updates.append((batch_resource_id, batch_value))

                        if self.sync_manager.batch_update_resources(batch_updates, self.worker_id):
                            logger.info(f"Worker-{self.worker_id}: 执行批量更新 {batch_updates}")
                            successful_operations += len(batch_updates)

                logger.info(f"Worker-{self.worker_id}: 完成,成功: {successful_operations}, 失败: {failed_operations}")
                return successful_operations, failed_operations

        class ResourceMonitor:
            def __init__(self, sync_manager):
                self.sync_manager = sync_manager

            def monitor_resources(self, duration=12, check_interval=1.5):
                """监控资源使用情况"""
                logger.info(f"Monitor: 开始资源监控")

                start_time = time.time()
                monitor_count = 0

                while time.time() - start_time < duration:
                    snapshot = self.sync_manager.get_resource_snapshot()

                    # 分析资源状态
                    idle_count = snapshot['resource_states'].count(0)
                    busy_count = snapshot['resource_states'].count(1)
                    maintenance_count = snapshot['resource_states'].count(2)

                    # 计算使用率
                    total_resources = len(snapshot['resource_states'])
                    utilization_rate = (busy_count / total_resources * 100) if total_resources > 0 else 0

                    monitor_info = {
                        'monitor_count': monitor_count + 1,
                        'idle_resources': idle_count,
                        'busy_resources': busy_count,
                        'maintenance_resources': maintenance_count,
                        'utilization_rate': utilization_rate,
                        'timestamp': snapshot['timestamp']
                    }

                    self.sync_manager.performance_metrics[f"monitor_{monitor_count}"] = monitor_info

                    logger.info(f"Monitor #{monitor_count + 1}: 空闲={idle_count}, 使用中={busy_count}, "
                              f"利用率={utilization_rate:.1f}%")

                    monitor_count += 1
                    time.sleep(check_interval)

                logger.info(f"Monitor: 监控结束,共 {monitor_count} 次检查")

        def run_advanced_sync_example():
            """运行高级同步策略示例"""
            # 创建高级同步管理器
            sync_manager = AdvancedSyncManager(num_resources=8)

            # 创建资源工作进程
            workers = []
            for i in range(1, 5):
                worker_process = multiprocessing.Process(
                    target=ResourceWorker(i, sync_manager).perform_operations,
                    args=(25,)  # 每个工作进程25个操作
                )
                workers.append(worker_process)
                worker_process.start()

            time.sleep(0.3)

            # 创建资源监控进程
            monitor_process = multiprocessing.Process(
                target=ResourceMonitor(sync_manager).monitor_resources,
                args=(10, 1)  # 监控10秒,每1秒检查一次
            )
            monitor_process.start()

            # 等待所有工作进程完成
            for worker in workers:
                worker.join()

            monitor_process.join()

            # 输出最终统计
            logger.info("=== 高级同步策略统计 ===")
            logger.info(f"全局计数器: {sync_manager.global_counter.value}")
            logger.info(f"操作历史记录: {len(sync_manager.operation_history)}")
            logger.info(f"进程统计: {len(sync_manager.resource_statistics)}")
            logger.info(f"性能指标: {len(sync_manager.performance_metrics)}")

            # 显示资源最终状态
            final_snapshot = sync_manager.get_resource_snapshot()
            logger.info(f"最终资源状态: {final_snapshot['resource_states']}")
            logger.info(f"最终资源数据: {[round(x, 2) for x in final_snapshot['resource_data']]}")

            return sync_manager

        if __name__ == "__main__":
            result = run_advanced_sync_example()
        ---

05.最佳实践与性能优化
    a.共享内存使用原则
        a.最小化共享数据
            只共享必要的数据,减少同步复杂度。
        b.合理分区
            将大块共享数据分区,减少锁竞争。
        c.数据局部性
            考虑CPU缓存友好的数据布局。
    b.同步策略优化
        a.锁粒度控制
            选择合适的锁粒度,平衡并发性和安全性。
        b.无锁算法
            在可能的情况下使用无锁数据结构。
        c.批量处理
            将多个小操作合并为一次大操作。
    c.错误处理与调试
        a.异常安全
            确保在异常情况下正确释放资源。
        b.死锁预防
            遵循一致的锁获取顺序。
        c.性能监控
            监控共享内存访问的性能指标。
        d.内存管理
            及时清理不再使用的共享内存资源。

8. 文件锁

8.1 fcntl模块

01.基本概念
    a.定义与作用
        fcntl模块是Python的标准库模块,提供了Unix系统调用fcntl()和flock()的接口,用于文件控制操作和文件锁定。
    b.主要特点
        a.跨平台兼容
            在Unix/Linux系统上提供统一的文件锁定接口。
        b.底层访问
            直接调用操作系统级别的文件控制功能。
        c.多种操作
            支持文件锁定、状态查询、属性修改等多种操作。
    c.文件锁类型
        a.建议性锁
            不强制阻止其他进程访问,依赖进程间的协作。
        b.强制性锁
            由操作系统强制执行,阻止未授权的文件访问。

02.文件锁定基础
    a.fcntl.flock函数
        提供文件加锁和解锁功能,支持多种锁模式。
    b.锁定模式
        a.LOCK_SH
            共享锁(读锁),多个进程可以同时持有。
        b.LOCK_EX
            排他锁(写锁),只能被一个进程持有。
        c.LOCK_UN
            解锁,释放之前持有的锁。
        d.LOCK_NB
            非阻塞标志,与LOCK_SH或LOCK_EX组合使用。
    c.代码示例
        ---
        # fcntl模块基础使用示例
        import fcntl
        import os
        import time
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class FileLockExample:
            def __init__(self, filename):
                self.filename = filename
                self.file_handle = None

            def acquire_shared_lock(self):
                """获取共享锁"""
                try:
                    self.file_handle = open(self.filename, 'r')
                    fcntl.flock(self.file_handle.fileno(), fcntl.LOCK_SH)
                    logger.info(f"获取共享锁成功: {self.filename}")
                    return True
                except Exception as e:
                    logger.error(f"获取共享锁失败: {e}")
                    if self.file_handle:
                        self.file_handle.close()
                        self.file_handle = None
                    return False

            def acquire_exclusive_lock(self):
                """获取排他锁"""
                try:
                    self.file_handle = open(self.filename, 'w')
                    fcntl.flock(self.file_handle.fileno(), fcntl.LOCK_EX)
                    logger.info(f"获取排他锁成功: {self.filename}")
                    return True
                except Exception as e:
                    logger.error(f"获取排他锁失败: {e}")
                    if self.file_handle:
                        self.file_handle.close()
                        self.file_handle = None
                    return False

            def try_acquire_exclusive_lock(self):
                """尝试获取排他锁(非阻塞)"""
                try:
                    self.file_handle = open(self.filename, 'w')
                    fcntl.flock(self.file_handle.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
                    logger.info(f"非阻塞获取排他锁成功: {self.filename}")
                    return True
                except (IOError, OSError) as e:
                    if e.errno == 11:  # EAGAIN or EWOULDBLOCK
                        logger.warning(f"文件被锁定,无法获取排他锁: {self.filename}")
                    else:
                        logger.error(f"获取排他锁失败: {e}")
                    if self.file_handle:
                        self.file_handle.close()
                        self.file_handle = None
                    return False

            def release_lock(self):
                """释放锁"""
                if self.file_handle:
                    try:
                        fcntl.flock(self.file_handle.fileno(), fcntl.LOCK_UN)
                        logger.info(f"释放锁成功: {self.filename}")
                    except Exception as e:
                        logger.error(f"释放锁失败: {e}")
                    finally:
                        self.file_handle.close()
                        self.file_handle = None

            def __enter__(self):
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                self.release_lock()

        # 使用示例
        def demo_basic_usage():
            filename = "test_lock.txt"

            # 创建测试文件
            with open(filename, 'w') as f:
                f.write("测试内容\n")

            # 排他锁示例
            with FileLockExample(filename) as lock:
                if lock.acquire_exclusive_lock():
                    logger.info("开始执行排他操作...")
                    time.sleep(2)  # 模拟长时间操作
                    with open(filename, 'a') as f:
                        f.write(f"排他操作完成于 {datetime.now()}\n")
                else:
                    logger.error("无法获取排他锁")

            # 共享锁示例
            with FileLockExample(filename) as lock:
                if lock.acquire_shared_lock():
                    logger.info("开始读取文件内容...")
                    time.sleep(1)  # 模拟读取操作
                    with open(filename, 'r') as f:
                        content = f.read()
                        logger.info(f"文件内容: {content.strip()}")
                else:
                    logger.error("无法获取共享锁")

            # 清理测试文件
            if os.path.exists(filename):
                os.remove(filename)
        ---

03.高级文件锁定
    a.文件状态查询
        a.fcntl.fcntl函数
            用于查询和修改文件状态标志。
        b.文件描述符操作
            支持对文件描述符的详细控制。
        c.代码示例
            ---
            def query_file_status(filename):
                """查询文件状态"""
                try:
                    fd = os.open(filename, os.O_RDONLY)

                    # 获取文件状态标志
                    flags = fcntl.fcntl(fd, fcntl.F_GETFL)
                    logger.info(f"文件状态标志: {flags}")

                    # 检查是否为只读模式
                    if flags & os.O_RDONLY:
                        logger.info("文件以只读模式打开")
                    if flags & os.O_WRONLY:
                        logger.info("文件以只写模式打开")
                    if flags & os.O_RDWR:
                        logger.info("文件以读写模式打开")

                    # 检查追加模式
                    if flags & os.O_APPEND:
                        logger.info("文件以追加模式打开")

                    os.close(fd)
                    return flags

                except Exception as e:
                    logger.error(f"查询文件状态失败: {e}")
                    return None

            def modify_file_flags(filename):
                """修改文件标志"""
                try:
                    fd = os.open(filename, os.O_WRONLY)

                    # 添加追加模式
                    current_flags = fcntl.fcntl(fd, fcntl.F_GETFL)
                    new_flags = current_flags | os.O_APPEND
                    fcntl.fcntl(fd, fcntl.F_SETFL, new_flags)

                    logger.info("成功设置文件追加模式")
                    os.close(fd)
                    return True

                except Exception as e:
                    logger.error(f"修改文件标志失败: {e}")
                    return False
            ---
    b.文件锁定优化
        a.锁定范围控制
            支持对文件的特定区域进行锁定。
        b.锁定超时机制
            实现带超时的锁定获取机制。
        c.代码示例
            ---
            import errno

            class AdvancedFileLock:
                def __init__(self, filename):
                    self.filename = filename
                    self.file_handle = None

                def acquire_with_timeout(self, lock_type, timeout=5.0):
                    """带超时的锁获取"""
                    start_time = time.time()

                    while time.time() - start_time < timeout:
                        try:
                            self.file_handle = open(self.filename, 'r+' if lock_type == 'exclusive' else 'r')
                            lock_mode = fcntl.LOCK_EX if lock_type == 'exclusive' else fcntl.LOCK_SH
                            fcntl.flock(self.file_handle.fileno(), lock_mode | fcntl.LOCK_NB)
                            logger.info(f"在 {time.time() - start_time:.2f}s 后获取{lock_type}锁成功")
                            return True
                        except (IOError, OSError) as e:
                            if e.errno == errno.EAGAIN or e.errno == errno.EACCES:
                                if self.file_handle:
                                    self.file_handle.close()
                                    self.file_handle = None
                                time.sleep(0.1)  # 短暂等待后重试
                            else:
                                logger.error(f"获取锁时发生错误: {e}")
                                break

                    logger.warning(f"在 {timeout}s 内无法获取{lock_type}锁")
                    return False

                def lock_file_region(self, offset, length, lock_type='exclusive'):
                    """锁定文件的特定区域"""
                    try:
                        self.file_handle = open(self.filename, 'r+')

                        # 构建锁数据结构
                        lock_data = {
                            'l_type': fcntl.F_WRLCK if lock_type == 'exclusive' else fcntl.F_RDLCK,
                            'l_whence': os.SEEK_SET,  # 从文件开始
                            'l_start': offset,
                            'l_len': length
                        }

                        # 应用文件区域锁
                        fcntl.fcntl(self.file_handle.fileno(), fcntl.F_SETLK, lock_data)
                        logger.info(f"成功锁定文件区域 [{offset}, {offset+length})")
                        return True

                    except Exception as e:
                        logger.error(f"锁定文件区域失败: {e}")
                        return False

                def check_file_locks(self):
                    """检查文件的锁状态"""
                    try:
                        self.file_handle = open(self.filename, 'r')

                        # 查询锁信息
                        lock_data = {
                            'l_type': fcntl.F_WRLCK,
                            'l_whence': os.SEEK_SET,
                            'l_start': 0,
                            'l_len': 0
                        }

                        locks_info = fcntl.fcntl(self.file_handle.fileno(), fcntl.F_GETLK, lock_data)
                        logger.info(f"文件锁状态查询结果: {locks_info}")

                        return locks_info

                    except Exception as e:
                        logger.error(f"查询文件锁状态失败: {e}")
                        return None
            ---
    c.进程间协作锁
        a.锁文件机制
            使用专门的锁文件实现进程间同步。
        b.锁状态管理
            维护锁的获取、释放和状态信息。
        c.代码示例
            ---
            import json
            import tempfile

            class ProcessCooperativeLock:
                def __init__(self, lock_name):
                    self.lock_name = lock_name
                    self.lock_file = os.path.join(tempfile.gettempdir(), f"{lock_name}.lock")
                    self.lock_handle = None

                def acquire_lock(self, process_id, timeout=10):
                    """获取进程协作锁"""
                    start_time = time.time()

                    while time.time() - start_time < timeout:
                        try:
                            # 尝试创建锁文件
                            fd = os.open(self.lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)

                            # 写入锁信息
                            lock_info = {
                                'process_id': process_id,
                                'acquire_time': datetime.now().isoformat(),
                                'hostname': os.uname()[1] if hasattr(os, 'uname') else 'unknown'
                            }

                            with os.fdopen(fd, 'w') as f:
                                json.dump(lock_info, f)

                            logger.info(f"进程 {process_id} 获取锁成功")
                            return True

                        except OSError as e:
                            if e.errno == errno.EEXIST:
                                # 锁文件已存在,检查锁是否有效
                                if self._is_lock_stale():
                                    self._cleanup_stale_lock()
                                    continue
                                else:
                                    time.sleep(0.1)
                            else:
                                logger.error(f"创建锁文件失败: {e}")
                                break

                    logger.warning(f"进程 {process_id} 在 {timeout}s 内无法获取锁")
                    return False

                def _is_lock_stale(self, max_age_seconds=30):
                    """检查锁是否过期"""
                    try:
                        stat = os.stat(self.lock_file)
                        age = time.time() - stat.st_mtime
                        return age > max_age_seconds
                    except OSError:
                        return True

                def _cleanup_stale_lock(self):
                    """清理过期锁"""
                    try:
                        os.remove(self.lock_file)
                        logger.info("清理过期锁文件")
                    except OSError as e:
                        logger.error(f"清理过期锁失败: {e}")

                def release_lock(self, process_id):
                    """释放进程协作锁"""
                    try:
                        # 验证锁的所有者
                        if os.path.exists(self.lock_file):
                            with open(self.lock_file, 'r') as f:
                                lock_info = json.load(f)

                            if lock_info['process_id'] == process_id:
                                os.remove(self.lock_file)
                                logger.info(f"进程 {process_id} 释放锁成功")
                                return True
                            else:
                                logger.warning(f"进程 {process_id} 不是锁的所有者,无法释放")
                                return False
                        else:
                            logger.warning("锁文件不存在")
                            return False

                    except Exception as e:
                        logger.error(f"释放锁失败: {e}")
                        return False
            ---

04.实际应用场景
    a.并发写入控制
        a.场景描述
            多个进程同时写入同一文件时的数据完整性保护。
        b.实现方案
            使用排他锁确保同一时间只有一个进程写入。
        c.代码示例
            ---
            class ConcurrentWriteController:
                def __init__(self, data_file):
                    self.data_file = data_file
                    self.lock = AdvancedFileLock(data_file)

                def safe_append_data(self, data, process_id):
                    """安全追加数据"""
                    logger.info(f"进程 {process_id} 尝试写入数据")

                    if self.lock.acquire_with_timeout('exclusive', timeout=3):
                        try:
                            # 获取当前文件大小作为写入位置
                            current_size = os.path.getsize(self.data_file)

                            # 写入数据
                            with open(self.data_file, 'a') as f:
                                timestamp = datetime.now().isoformat()
                                entry = f"{timestamp} - Process-{process_id}: {data}\n"
                                f.write(entry)

                            logger.info(f"进程 {process_id} 写入完成,位置: {current_size}")
                            return True

                        finally:
                            self.lock.release_lock()
                    else:
                        logger.warning(f"进程 {process_id} 写入超时")
                        return False

                def safe_read_data(self, process_id):
                    """安全读取数据"""
                    logger.info(f"进程 {process_id} 开始读取数据")

                    if self.lock.acquire_with_timeout('shared', timeout=2):
                        try:
                            with open(self.data_file, 'r') as f:
                                content = f.read()

                            lines = content.strip().split('\n') if content.strip() else []
                            logger.info(f"进程 {process_id} 读取完成,共 {len(lines)} 行")
                            return lines

                        finally:
                            self.lock.release_lock()
                    else:
                        logger.warning(f"进程 {process_id} 读取超时")
                        return None
            ---
    b.配置文件保护
        a.场景描述
            防止多个进程同时修改配置文件导致冲突。
        b.实现方案
            使用文件锁保护配置文件的读写操作。
        c.代码示例
            ---
            class ConfigFileProtector:
                def __init__(self, config_file):
                    self.config_file = config_file
                    self.lock = FileLockExample(config_file)

                def read_config(self, process_id):
                    """读取配置文件"""
                    if self.lock.acquire_shared_lock():
                        try:
                            config = {}
                            if os.path.exists(self.config_file):
                                with open(self.config_file, 'r') as f:
                                    for line in f:
                                        line = line.strip()
                                        if line and '=' in line:
                                            key, value = line.split('=', 1)
                                            config[key.strip()] = value.strip()

                            logger.info(f"进程 {process_id} 读取配置成功,共 {len(config)} 项")
                            return config

                        finally:
                            self.lock.release_lock()
                    else:
                        logger.error(f"进程 {process_id} 无法读取配置")
                        return {}

                def update_config(self, updates, process_id):
                    """更新配置文件"""
                    if self.lock.acquire_exclusive_lock():
                        try:
                            # 读取现有配置
                            config = self.read_config(process_id)

                            # 应用更新
                            config.update(updates)

                            # 写回配置文件
                            with open(self.config_file, 'w') as f:
                                for key, value in sorted(config.items()):
                                    f.write(f"{key}={value}\n")

                            logger.info(f"进程 {process_id} 更新配置成功,更新了 {len(updates)} 项")
                            return True

                        finally:
                            self.lock.release_lock()
                    else:
                        logger.error(f"进程 {process_id} 无法更新配置")
                        return False

                def backup_config(self, process_id):
                    """备份配置文件"""
                    if self.lock.acquire_shared_lock():
                        try:
                            if os.path.exists(self.config_file):
                                backup_file = f"{self.config_file}.backup.{int(time.time())}"
                                import shutil
                                shutil.copy2(self.config_file, backup_file)
                                logger.info(f"进程 {process_id} 备份配置到 {backup_file}")
                                return backup_file
                            else:
                                logger.warning(f"进程 {process_id} 配置文件不存在")
                                return None
                        finally:
                            self.lock.release_lock()
                    else:
                        logger.error(f"进程 {process_id} 无法备份配置")
                        return None
            ---
    c.日志文件管理
        a.场景描述
            多进程环境下安全写入日志文件,避免日志混乱。
        b.实现方案
            使用文件锁确保日志写入的原子性和有序性。
        c.代码示例
            ---
            class SafeLogWriter:
                def __init__(self, log_file):
                    self.log_file = log_file
                    self.lock = FileLockExample(log_file)
                    self.buffer = []
                    self.buffer_size = 100

                def log_message(self, level, message, process_id):
                    """记录日志消息"""
                    timestamp = datetime.now().isoformat()
                    log_entry = f"{timestamp} [{level}] Process-{process_id}: {message}"

                    self.buffer.append(log_entry)

                    # 缓冲区满时刷新
                    if len(self.buffer) >= self.buffer_size:
                        return self.flush_buffer(process_id)

                    return True

                def flush_buffer(self, process_id):
                    """刷新缓冲区到文件"""
                    if not self.buffer:
                        return True

                    if self.lock.acquire_exclusive_lock():
                        try:
                            with open(self.log_file, 'a') as f:
                                for entry in self.buffer:
                                    f.write(entry + '\n')

                            flushed_count = len(self.buffer)
                            self.buffer.clear()
                            logger.info(f"进程 {process_id} 刷新了 {flushed_count} 条日志")
                            return True

                        finally:
                            self.lock.release_lock()
                    else:
                        logger.error(f"进程 {process_id} 无法刷新日志缓冲区")
                        return False

                def rotate_log(self, max_size_mb=10, process_id):
                    """日志轮转"""
                    if os.path.exists(self.log_file):
                        file_size = os.path.getsize(self.log_file)
                        max_size_bytes = max_size_mb * 1024 * 1024

                        if file_size > max_size_bytes:
                            if self.lock.acquire_exclusive_lock():
                                try:
                                    timestamp = int(time.time())
                                    rotated_file = f"{self.log_file}.{timestamp}"
                                    import shutil
                                    shutil.move(self.log_file, rotated_file)

                                    logger.info(f"进程 {process_id} 轮转日志文件到 {rotated_file}")
                                    return True

                                finally:
                                    self.lock.release_lock()
                            else:
                                logger.error(f"进程 {process_id} 无法轮转日志文件")
                                return False

                    return False
            ---

05.最佳实践与注意事项
    a.使用原则
        a.最小锁持有时间
            尽量缩短锁的持有时间,减少其他进程等待。
        b.异常安全
            确保在异常情况下正确释放锁资源。
        c.避免死锁
            建立统一的锁获取顺序,避免循环等待。
    b.性能优化
        a.批量操作
            将多个小操作合并为一次锁定操作。
        b.读写分离
            使用共享锁和排他锁实现读写分离。
        c.锁粒度控制
            选择合适的锁粒度,避免过度锁定。
    c.调试技巧
        a.锁状态监控
            记录锁的获取、释放和等待情况。
        b.超时设置
            为锁获取设置合理的超时时间。
        c.错误处理
            提供详细的错误信息和恢复机制。

8.2 msvcrt模块

01.基本概念
    a.定义与作用
        msvcrt模块是Python在Windows平台上的标准库模块,提供了对Microsoft Visual C++运行时库的访问,主要用于文件锁定和控制台I/O操作。
    b.主要特点
        a.平台特定
            仅在Windows操作系统上可用,提供Windows特有的文件锁定功能。
        b.底层接口
            直接调用Windows API的文件锁定功能。
        c.高效实现
            基于操作系统级别的文件锁定机制,性能较高。
    c.适用范围
        a.文件锁定
            在Windows平台上实现文件的共享和排他锁定。
        b.进程同步
            通过文件锁实现多进程间的同步机制。
        c.资源保护
            保护关键资源在多进程环境下的并发访问。

02.文件锁定基础
    a.msvcrt.locking函数
        提供文件区域锁定功能,支持锁定文件的指定字节数范围。
    b.锁定模式
        a.LK_LOCK
            排他锁,如果无法获取则等待。
        b.LK_NBLCK
            非阻塞排他锁,立即返回失败或成功。
        c.LK_RLCK
            共享锁(读锁),如果无法获取则等待。
        d.LK_NBRLCK
            非阻塞共享锁,立即返回失败或成功。
        e.LK_UNLCK
            解锁,释放之前持有的锁。
    c.代码示例
        ---
        # msvcrt模块基础使用示例
        import msvcrt
        import os
        import time
        import logging
        from datetime import datetime

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class WindowsFileLock:
            def __init__(self, filename):
                self.filename = filename
                self.file_handle = None

            def open_file(self, mode='r+b'):
                """打开文件"""
                try:
                    self.file_handle = open(self.filename, mode)
                    logger.info(f"文件打开成功: {self.filename}")
                    return True
                except Exception as e:
                    logger.error(f"打开文件失败: {e}")
                    return False

            def lock_region(self, offset, size, lock_mode):
                """锁定文件的指定区域"""
                if not self.file_handle:
                    logger.error("文件未打开")
                    return False

                try:
                    # 定位到指定位置
                    self.file_handle.seek(offset)

                    # 应用文件锁
                    msvcrt.locking(self.file_handle.fileno(), lock_mode, size)
                    logger.info(f"成功锁定文件区域 [{offset}, {offset+size})")
                    return True

                except OSError as e:
                    if e.errno == 33:  # 错误代码33表示文件被锁定
                        logger.warning(f"文件区域已被锁定: [{offset}, {offset+size})")
                    else:
                        logger.error(f"锁定文件区域失败: {e}")
                    return False
                except Exception as e:
                    logger.error(f"锁定文件区域异常: {e}")
                    return False

            def unlock_region(self, offset, size):
                """解锁文件的指定区域"""
                if not self.file_handle:
                    logger.error("文件未打开")
                    return False

                try:
                    # 定位到指定位置
                    self.file_handle.seek(offset)

                    # 释放文件锁
                    msvcrt.locking(self.file_handle.fileno(), msvcrt.LK_UNLCK, size)
                    logger.info(f"成功解锁文件区域 [{offset}, {offset+size})")
                    return True

                except Exception as e:
                    logger.error(f"解锁文件区域失败: {e}")
                    return False

            def close_file(self):
                """关闭文件"""
                if self.file_handle:
                    self.file_handle.close()
                    self.file_handle = None
                    logger.info("文件已关闭")

        # 使用示例
        def demo_basic_locking():
            filename = "test_windows_lock.txt"

            # 创建测试文件
            with open(filename, 'wb') as f:
                f.write(b"0123456789" * 10)  # 100字节内容

            # 创建文件锁对象
            file_lock = WindowsFileLock(filename)

            # 打开文件
            if file_lock.open_file('r+b'):
                try:
                    # 锁定前20字节
                    if file_lock.lock_region(0, 20, msvcrt.LK_LOCK):
                        logger.info("开始执行锁定区域操作...")

                        # 写入数据到锁定区域
                        file_lock.file_handle.seek(0)
                        file_lock.file_handle.write(b"LOCKED_DATA_HERE...")
                        file_lock.file_handle.flush()

                        time.sleep(2)  # 模拟处理时间

                        # 解锁
                        file_lock.unlock_region(0, 20)
                        logger.info("锁定区域操作完成")

                finally:
                    file_lock.close_file()

            # 清理测试文件
            if os.path.exists(filename):
                os.remove(filename)
        ---

03.高级文件锁定
    a.文件范围锁定
        a.部分文件锁定
            只锁定文件的特定部分,提高并发性能。
        b.多区域锁定
            同时锁定文件的多个不同区域。
        c.代码示例
            ---
            class AdvancedWindowsLock:
                def __init__(self, filename):
                    self.filename = filename
                    self.file_handle = None
                    self.locked_regions = []

                def open_file(self, mode='r+b'):
                    """打开文件"""
                    try:
                        self.file_handle = open(self.filename, mode)
                        return True
                    except Exception as e:
                        logger.error(f"打开文件失败: {e}")
                        return False

                def lock_multiple_regions(self, regions, lock_mode=msvcrt.LK_LOCK):
                    """锁定多个区域"""
                    if not self.file_handle:
                        logger.error("文件未打开")
                        return False

                    locked_count = 0
                    for offset, size in regions:
                        try:
                            self.file_handle.seek(offset)
                            msvcrt.locking(self.file_handle.fileno(), lock_mode, size)
                            self.locked_regions.append((offset, size))
                            locked_count += 1
                            logger.info(f"成功锁定区域 [{offset}, {offset+size})")
                        except OSError as e:
                            logger.warning(f"锁定区域 [{offset}, {offset+size}) 失败: {e}")

                    logger.info(f"成功锁定 {locked_count}/{len(regions)} 个区域")
                    return locked_count > 0

                def try_lock_region(self, offset, size, lock_mode=msvcrt.LK_NBLCK):
                    """尝试锁定单个区域(非阻塞)"""
                    if not self.file_handle:
                        logger.error("文件未打开")
                        return False

                    try:
                        self.file_handle.seek(offset)
                        msvcrt.locking(self.file_handle.fileno(), lock_mode, size)
                        self.locked_regions.append((offset, size))
                        logger.info(f"非阻塞锁定区域 [{offset}, {offset+size}) 成功")
                        return True
                    except OSError as e:
                        if e.errno == 33:
                            logger.warning(f"区域 [{offset}, {offset+size}) 已被锁定")
                        else:
                            logger.error(f"锁定区域失败: {e}")
                        return False

                def unlock_all_regions(self):
                    """解锁所有已锁定的区域"""
                    if not self.file_handle:
                        return False

                    unlocked_count = 0
                    for offset, size in self.locked_regions:
                        try:
                            self.file_handle.seek(offset)
                            msvcrt.locking(self.file_handle.fileno(), msvcrt.LK_UNLCK, size)
                            unlocked_count += 1
                            logger.info(f"成功解锁区域 [{offset}, {offset+size})")
                        except Exception as e:
                            logger.error(f"解锁区域 [{offset}, {offset+size}) 失败: {e}")

                    self.locked_regions.clear()
                    logger.info(f"成功解锁 {unlocked_count} 个区域")
                    return unlocked_count > 0

                def get_locked_regions_info(self):
                    """获取已锁定区域信息"""
                    return self.locked_regions.copy()

                def close_file(self):
                    """关闭文件"""
                    if self.file_handle:
                        self.unlock_all_regions()
                        self.file_handle.close()
                        self.file_handle = None
                        logger.info("文件已关闭")
            ---
    b.读写锁实现
        a.共享读锁
            多个进程可以同时读取文件。
        b.排他写锁
            只有一个进程可以写入文件。
        c.锁升级降级
            支持从读锁升级到写锁,或从写锁降级到读锁。
        d.代码示例
            ---
            class WindowsReadWriteLock:
                def __init__(self, filename):
                    self.filename = filename
                    self.file_handle = None
                    self.lock_type = None
                    self.lock_offset = 0
                    self.lock_size = 0

                def open_file(self, mode='r+b'):
                    """打开文件"""
                    try:
                        self.file_handle = open(self.filename, mode)
                        return True
                    except Exception as e:
                        logger.error(f"打开文件失败: {e}")
                        return False

                def acquire_read_lock(self, offset=0, size=0):
                    """获取读锁(共享锁)"""
                    if size == 0:
                        # 锁定整个文件
                        self.file_handle.seek(0, 2)  # 移动到文件末尾
                        size = self.file_handle.tell()
                        self.file_handle.seek(0)

                    if self.lock_type:
                        logger.warning("已持有锁,需要先释放")
                        return False

                    try:
                        self.file_handle.seek(offset)
                        msvcrt.locking(self.file_handle.fileno(), msvcrt.LK_RLCK, size)
                        self.lock_type = 'read'
                        self.lock_offset = offset
                        self.lock_size = size
                        logger.info(f"获取读锁成功: [{offset}, {offset+size})")
                        return True
                    except Exception as e:
                        logger.error(f"获取读锁失败: {e}")
                        return False

                def acquire_write_lock(self, offset=0, size=0):
                    """获取写锁(排他锁)"""
                    if size == 0:
                        # 锁定整个文件
                        self.file_handle.seek(0, 2)
                        size = self.file_handle.tell()
                        self.file_handle.seek(0)

                    if self.lock_type == 'write':
                        logger.warning("已持有写锁")
                        return True

                    # 如果持有读锁,需要先释放再获取写锁
                    if self.lock_type == 'read':
                        if not self.release_lock():
                            return False

                    try:
                        self.file_handle.seek(offset)
                        msvcrt.locking(self.file_handle.fileno(), msvcrt.LK_LOCK, size)
                        self.lock_type = 'write'
                        self.lock_offset = offset
                        self.lock_size = size
                        logger.info(f"获取写锁成功: [{offset}, {offset+size})")
                        return True
                    except Exception as e:
                        logger.error(f"获取写锁失败: {e}")
                        return False

                def try_acquire_write_lock(self, offset=0, size=0):
                    """尝试获取写锁(非阻塞)"""
                    if size == 0:
                        self.file_handle.seek(0, 2)
                        size = self.file_handle.tell()
                        self.file_handle.seek(0)

                    if self.lock_type == 'write':
                        return True

                    if self.lock_type == 'read':
                        if not self.release_lock():
                            return False

                    try:
                        self.file_handle.seek(offset)
                        msvcrt.locking(self.file_handle.fileno(), msvcrt.LK_NBLCK, size)
                        self.lock_type = 'write'
                        self.lock_offset = offset
                        self.lock_size = size
                        logger.info(f"非阻塞获取写锁成功: [{offset}, {offset+size})")
                        return True
                    except OSError as e:
                        if e.errno == 33:
                            logger.warning(f"写锁获取失败,文件被锁定")
                        else:
                            logger.error(f"获取写锁失败: {e}")
                        return False

                def release_lock(self):
                    """释放锁"""
                    if not self.lock_type:
                        logger.warning("未持有锁")
                        return True

                    try:
                        self.file_handle.seek(self.lock_offset)
                        msvcrt.locking(self.file_handle.fileno(), msvcrt.LK_UNLCK, self.lock_size)
                        logger.info(f"释放{self.lock_type}锁成功: [{self.lock_offset}, {self.lock_offset+self.lock_size})")
                        self.lock_type = None
                        self.lock_offset = 0
                        self.lock_size = 0
                        return True
                    except Exception as e:
                        logger.error(f"释放锁失败: {e}")
                        return False

                def close_file(self):
                    """关闭文件"""
                    if self.file_handle:
                        self.release_lock()
                        self.file_handle.close()
                        self.file_handle = None
            ---
    c.文件锁监控
        a.锁状态检查
            检查文件是否被其他进程锁定。
        b.锁竞争检测
            监控锁竞争情况,分析性能瓶颈。
        c.代码示例
            ---
            import threading

            class WindowsLockMonitor:
                def __init__(self):
                    self.lock_statistics = {}
                    self.monitor_thread = None
                    self.monitoring = False

                def start_monitoring(self, filename, interval=1.0):
                    """开始监控文件锁状态"""
                    self.monitoring = True
                    self.filename = filename
                    self.lock_statistics[filename] = {
                        'lock_attempts': 0,
                        'successful_locks': 0,
                        'failed_locks': 0,
                        'avg_wait_time': 0,
                        'total_wait_time': 0
                    }

                    self.monitor_thread = threading.Thread(target=self._monitor_loop, args=(interval,))
                    self.monitor_thread.daemon = True
                    self.monitor_thread.start()
                    logger.info(f"开始监控文件锁: {filename}")

                def stop_monitoring(self):
                    """停止监控"""
                    self.monitoring = False
                    if self.monitor_thread:
                        self.monitor_thread.join(timeout=2)
                    logger.info("文件锁监控已停止")

                def _monitor_loop(self, interval):
                    """监控循环"""
                    while self.monitoring:
                        self._check_lock_status()
                        time.sleep(interval)

                def _check_lock_status(self):
                    """检查锁状态"""
                    try:
                        # 尝试以非阻塞方式获取锁
                        with open(self.filename, 'r+b') as f:
                            f.seek(0)
                            start_time = time.time()

                            try:
                                msvcrt.locking(f.fileno(), msvcrt.LK_NBLCK, 1)
                                msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1)
                                wait_time = time.time() - start_time
                                self._update_statistics(True, wait_time)
                            except OSError:
                                wait_time = time.time() - start_time
                                self._update_statistics(False, wait_time)

                    except FileNotFoundError:
                        logger.warning(f"监控文件不存在: {self.filename}")

                def _update_statistics(self, success, wait_time):
                    """更新统计信息"""
                    if self.filename in self.lock_statistics:
                        stats = self.lock_statistics[self.filename]
                        stats['lock_attempts'] += 1

                        if success:
                            stats['successful_locks'] += 1
                        else:
                            stats['failed_locks'] += 1

                        stats['total_wait_time'] += wait_time
                        stats['avg_wait_time'] = stats['total_wait_time'] / stats['lock_attempts']

                def get_statistics(self):
                    """获取统计信息"""
                    return self.lock_statistics.copy()

                def reset_statistics(self):
                    """重置统计信息"""
                    for filename in self.lock_statistics:
                        self.lock_statistics[filename] = {
                            'lock_attempts': 0,
                            'successful_locks': 0,
                            'failed_locks': 0,
                            'avg_wait_time': 0,
                            'total_wait_time': 0
                        }
            ---

04.实际应用场景
    a.数据库文件保护
        a.场景描述
            多进程访问SQLite等嵌入式数据库时的并发控制。
        b.实现方案
            使用msvcrt锁定数据库文件的关键区域。
        c.代码示例
            ---
            class DatabaseFileProtector:
                def __init__(self, db_file):
                    self.db_file = db_file
                    self.lock = WindowsReadWriteLock(db_file)

                def execute_read_operation(self, query_func, *args, **kwargs):
                    """执行读操作"""
                    if not self.lock.open_file('rb'):
                        return None

                    try:
                        # 获取读锁
                        if self.lock.acquire_read_lock():
                            logger.info("开始执行数据库读操作")
                            result = query_func(*args, **kwargs)
                            logger.info("数据库读操作完成")
                            return result
                        else:
                            logger.error("无法获取读锁")
                            return None
                    finally:
                        self.lock.release_lock()
                        self.lock.close_file()

                def execute_write_operation(self, query_func, *args, **kwargs):
                    """执行写操作"""
                    if not self.lock.open_file('r+b'):
                        return False

                    try:
                        # 获取写锁
                        if self.lock.acquire_write_lock():
                            logger.info("开始执行数据库写操作")
                            result = query_func(*args, **kwargs)
                            logger.info("数据库写操作完成")
                            return result
                        else:
                            logger.error("无法获取写锁")
                            return False
                    finally:
                        self.lock.release_lock()
                        self.lock.close_file()

                def try_transaction(self, operations, timeout=5.0):
                    """尝试执行事务"""
                    if not self.lock.open_file('r+b'):
                        return False

                    start_time = time.time()

                    try:
                        # 尝试获取写锁
                        while time.time() - start_time < timeout:
                            if self.lock.try_acquire_write_lock():
                                try:
                                    logger.info("事务开始执行")
                                    for operation in operations:
                                        operation()
                                    logger.info("事务执行完成")
                                    return True
                                except Exception as e:
                                    logger.error(f"事务执行失败: {e}")
                                    return False
                                finally:
                                    self.lock.release_lock()
                            else:
                                time.sleep(0.1)

                        logger.warning(f"事务在 {timeout}s 内无法获取锁")
                        return False
                    finally:
                        self.lock.close_file()

                # 示例数据库操作函数
                def example_read_operation(self):
                    """示例读操作"""
                    time.sleep(0.5)  # 模拟读取时间
                    return {"status": "success", "data": "sample_data"}

                def example_write_operation(self):
                    """示例写操作"""
                    time.sleep(1.0)  # 模拟写入时间
                    return {"status": "success", "affected_rows": 1}
            ---
    b.日志文件并发写入
        a.场景描述
            多进程同时写入日志文件,确保日志条目的完整性。
        b.实现方案
            使用msvcrt锁定日志文件的写入位置。
        c.代码示例
            ---
            class ConcurrentLogFile:
                def __init__(self, log_file):
                    self.log_file = log_file
                    self.lock = WindowsFileLock(log_file)
                    self.entry_counter = 0

                def write_log_entry(self, level, message, process_id):
                    """写入日志条目"""
                    if not self.lock.open_file('a+b'):
                        return False

                    try:
                        # 获取当前文件大小
                        self.lock.file_handle.seek(0, 2)  # 移动到文件末尾
                        file_size = self.lock.file_handle.tell()

                        # 锁定文件末尾区域
                        lock_size = 1024  # 锁定足够大的区域
                        if self.lock.lock_region(file_size, lock_size, msvcrt.LK_LOCK):
                            try:
                                # 格式化日志条目
                                timestamp = datetime.now().isoformat()
                                entry = f"{timestamp} [{level}] P{process_id}: {message}\n"
                                entry_bytes = entry.encode('utf-8')

                                # 写入日志条目
                                self.lock.file_handle.seek(file_size)
                                self.lock.file_handle.write(entry_bytes)
                                self.lock.file_handle.flush()

                                self.entry_counter += 1
                                logger.info(f"进程 {process_id} 写入日志条目 #{self.entry_counter}")
                                return True

                            finally:
                                # 解锁
                                self.lock.unlock_region(file_size, lock_size)
                        else:
                            logger.warning(f"进程 {process_id} 无法锁定日志文件")
                            return False

                    finally:
                        self.lock.close_file()

                def write_multiple_entries(self, entries, process_id):
                    """批量写入多个日志条目"""
                    if not self.lock.open_file('a+b'):
                        return False

                    try:
                        # 获取文件大小
                        self.lock.file_handle.seek(0, 2)
                        file_size = self.lock.file_handle.tell()

                        # 计算总大小并锁定
                        total_size = sum(len(f"{entry['timestamp']} [{entry['level']}] P{process_id}: {entry['message']}\n")
                                        for entry in entries) + 100  # 额外缓冲

                        if self.lock.lock_region(file_size, total_size, msvcrt.LK_LOCK):
                            try:
                                current_pos = file_size
                                for entry in entries:
                                    log_entry = f"{entry['timestamp']} [{entry['level']}] P{process_id}: {entry['message']}\n"
                                    entry_bytes = log_entry.encode('utf-8')

                                    self.lock.file_handle.seek(current_pos)
                                    self.lock.file_handle.write(entry_bytes)
                                    current_pos += len(entry_bytes)

                                self.lock.file_handle.flush()
                                logger.info(f"进程 {process_id} 批量写入 {len(entries)} 条日志")
                                return True

                            finally:
                                self.lock.unlock_region(file_size, total_size)
                        else:
                            logger.warning(f"进程 {process_id} 无法锁定日志文件进行批量写入")
                            return False

                    finally:
                        self.lock.close_file()

                def read_log_entries(self, max_entries=100):
                    """读取日志条目"""
                    if not os.path.exists(self.log_file):
                        return []

                    with open(self.log_file, 'r', encoding='utf-8') as f:
                        lines = f.readlines()

                    # 返回最后N行
                    return lines[-max_entries:] if len(lines) > max_entries else lines
            ---
    c.配置文件并发管理
        a.场景描述
            多进程需要安全地读取和更新配置文件。
        b.实现方案
            使用msvcrt实现配置文件的读写锁控制。
        c.代码示例
            ---
            import json
            import shutil

            class ConfigFileManager:
                def __init__(self, config_file):
                    self.config_file = config_file
                    self.lock = WindowsReadWriteLock(config_file)

                def read_config(self):
                    """读取配置文件"""
                    if not self.lock.open_file('r+b'):
                        return {}

                    try:
                        # 获取读锁
                        if self.lock.acquire_read_lock():
                            try:
                                # 读取文件内容
                                self.lock.file_handle.seek(0)
                                content = self.lock.file_handle.read().decode('utf-8')

                                if content.strip():
                                    config = json.loads(content)
                                else:
                                    config = {}

                                logger.info(f"配置文件读取成功,共 {len(config)} 项")
                                return config

                            except json.JSONDecodeError as e:
                                logger.error(f"配置文件JSON解析失败: {e}")
                                return {}
                            finally:
                                self.lock.release_lock()
                        else:
                            logger.error("无法获取配置文件读锁")
                            return {}

                    finally:
                        self.lock.close_file()

                def update_config(self, updates):
                    """更新配置文件"""
                    if not self.lock.open_file('r+b'):
                        return False

                    try:
                        # 获取写锁
                        if self.lock.acquire_write_lock():
                            try:
                                # 读取现有配置
                                self.lock.file_handle.seek(0)
                                content = self.lock.file_handle.read().decode('utf-8')

                                if content.strip():
                                    config = json.loads(content)
                                else:
                                    config = {}

                                # 应用更新
                                old_config = config.copy()
                                config.update(updates)

                                # 写回文件
                                new_content = json.dumps(config, indent=2, ensure_ascii=False)
                                self.lock.file_handle.seek(0)
                                self.lock.file_handle.write(new_content.encode('utf-8'))
                                self.lock.file_handle.truncate()
                                self.lock.file_handle.flush()

                                logger.info(f"配置文件更新成功,修改了 {len(updates)} 项")
                                return True

                            except json.JSONDecodeError as e:
                                logger.error(f"配置文件JSON解析失败: {e}")
                                return False
                            finally:
                                self.lock.release_lock()
                        else:
                            logger.error("无法获取配置文件写锁")
                            return False

                    finally:
                        self.lock.close_file()

                def backup_config(self, backup_suffix=None):
                    """备份配置文件"""
                    if not os.path.exists(self.config_file):
                        logger.warning("配置文件不存在,无需备份")
                        return None

                    if backup_suffix is None:
                        backup_suffix = int(time.time())

                    backup_file = f"{self.config_file}.backup.{backup_suffix}"

                    if not self.lock.open_file('rb'):
                        return None

                    try:
                        # 获取读锁进行备份
                        if self.lock.acquire_read_lock():
                            try:
                                shutil.copy2(self.config_file, backup_file)
                                logger.info(f"配置文件备份到: {backup_file}")
                                return backup_file
                            finally:
                                self.lock.release_lock()
                        else:
                            logger.error("无法获取配置文件读锁进行备份")
                            return None
                    finally:
                        self.lock.close_file()

                def restore_config(self, backup_file):
                    """从备份恢复配置文件"""
                    if not os.path.exists(backup_file):
                        logger.error(f"备份文件不存在: {backup_file}")
                        return False

                    if not self.lock.open_file('wb'):
                        return False

                    try:
                        # 获取写锁进行恢复
                        if self.lock.acquire_write_lock():
                            try:
                                shutil.copy2(backup_file, self.config_file)
                                logger.info(f"配置文件从备份恢复: {backup_file}")
                                return True
                            finally:
                                self.lock.release_lock()
                        else:
                            logger.error("无法获取配置文件写锁进行恢复")
                            return False
                    finally:
                        self.lock.close_file()
            ---

05.最佳实践与注意事项
    a.使用原则
        a.平台限制
            msvcrt模块仅在Windows平台可用,需要考虑跨平台兼容性。
        b.锁范围控制
            合理设置锁定范围,避免过度锁定影响并发性能。
        c.异常处理
            妥善处理锁定失败和异常情况。
    b.性能优化
        a.最小锁持有时间
            尽量缩短锁的持有时间,减少其他进程等待。
        b.区域锁定
            优先使用区域锁定而非全文件锁定。
        c.非阻塞操作
            在可能的情况下使用非阻塞锁定方式。
    c.调试技巧
        a.锁状态日志
            详细记录锁的获取和释放情况。
        b.超时机制
            实现合理的超时机制避免无限等待。
        c.监控工具
            使用监控工具分析锁竞争情况。

8.3 独占锁与共享锁

01.锁类型基本概念
    a.独占锁(Exclusive Lock)
        a.定义与特性
            独占锁(写锁)确保只有一个进程可以访问被锁定的资源,其他进程必须等待锁被释放。
        b.使用场景
            适用于需要修改数据的场景,确保数据的一致性和完整性。
        c.工作原理
            当一个进程获得独占锁时,其他进程无法获得任何类型的锁(包括共享锁)。
    b.共享锁(Shared Lock)
        a.定义与特性
            共享锁(读锁)允许多个进程同时读取被锁定的资源,但阻止任何进程进行写入操作。
        b.使用场景
            适用于只读操作,提高并发性能和资源利用率。
        c.工作原理
            多个进程可以同时持有共享锁,但不能与独占锁共存。
    c.锁兼容性矩阵
        a.请求\持有矩阵
            请求独占锁 vs 持有独占锁:不兼容
            请求独占锁 vs 持有共享锁:不兼容
            请求共享锁 vs 持有独占锁:不兼容
            请求共享锁 vs 持有共享锁:兼容
        b.优先级策略
            独占锁通常具有更高优先级,避免写饥饿问题。

02.独占锁实现
    a.基础独占锁
        a.基本机制
            使用文件锁或内存锁实现资源的独占访问。
        b.获取方式
            支持阻塞和非阻塞两种获取模式。
        c.代码示例
            ---
            # 独占锁基础实现示例
            import time
            import logging
            from datetime import datetime
            import threading
            from typing import Optional, Dict, Any

            logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
            logger = logging.getLogger(__name__)

            class ExclusiveLock:
                def __init__(self, resource_name: str, timeout: float = 30.0):
                    self.resource_name = resource_name
                    self.timeout = timeout
                    self.lock_file = f"/tmp/.exclusive_lock_{resource_name}"
                    self.lock_handle = None
                    self.owner = None
                    self.acquire_time = None

                def acquire(self, blocking: bool = True) -> bool:
                    """获取独占锁"""
                    start_time = time.time()

                    while True:
                        try:
                            # 尝试创建锁文件(原子操作)
                            import os
                            fd = os.open(self.lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)

                            # 写入锁信息
                            lock_info = {
                                'owner': f"Process-{os.getpid()}",
                                'thread_id': threading.current_thread().ident,
                                'acquire_time': datetime.now().isoformat()
                            }

                            with os.fdopen(fd, 'w') as f:
                                import json
                                json.dump(lock_info, f)

                            self.owner = lock_info['owner']
                            self.acquire_time = lock_info['acquire_time']
                            logger.info(f"独占锁获取成功: {self.resource_name} by {self.owner}")
                            return True

                        except OSError as e:
                            if e.errno == 17:  # 文件已存在
                                if not blocking:
                                    logger.warning(f"独占锁获取失败(非阻塞): {self.resource_name}")
                                    return False

                                # 检查超时
                                if time.time() - start_time > self.timeout:
                                    logger.warning(f"独占锁获取超时: {self.resource_name}")
                                    return False

                                # 检查锁是否过期
                                if self._is_lock_stale():
                                    self._cleanup_stale_lock()
                                    continue

                                # 等待后重试
                                time.sleep(0.1)
                            else:
                                logger.error(f"创建锁文件失败: {e}")
                                return False

                def try_acquire(self) -> bool:
                    """尝试获取独占锁(非阻塞)"""
                    return self.acquire(blocking=False)

                def release(self) -> bool:
                    """释放独占锁"""
                    try:
                        if os.path.exists(self.lock_file):
                            # 验证锁的所有者
                            with open(self.lock_file, 'r') as f:
                                import json
                                lock_info = json.load(f)

                            if lock_info['owner'] == self.owner:
                                os.remove(self.lock_file)
                                logger.info(f"独占锁释放成功: {self.resource_name} by {self.owner}")
                                return True
                            else:
                                logger.warning(f"尝试释放不属于自己的锁: {self.resource_name}")
                                return False
                        else:
                            logger.warning(f"锁文件不存在: {self.resource_name}")
                            return False
                    except Exception as e:
                        logger.error(f"释放锁失败: {e}")
                        return False

                def _is_lock_stale(self, max_age_seconds: float = 60.0) -> bool:
                    """检查锁是否过期"""
                    try:
                        stat = os.stat(self.lock_file)
                        age = time.time() - stat.st_mtime
                        return age > max_age_seconds
                    except OSError:
                        return True

                def _cleanup_stale_lock(self):
                    """清理过期锁"""
                    try:
                        os.remove(self.lock_file)
                        logger.info(f"清理过期锁: {self.resource_name}")
                    except OSError as e:
                        logger.error(f"清理过期锁失败: {e}")

                def get_lock_info(self) -> Optional[Dict[str, Any]]:
                    """获取锁信息"""
                    try:
                        if os.path.exists(self.lock_file):
                            with open(self.lock_file, 'r') as f:
                                import json
                                return json.load(f)
                        return None
                    except Exception as e:
                        logger.error(f"获取锁信息失败: {e}")
                        return None

                def __enter__(self):
                    if self.acquire():
                        return self
                    else:
                        raise RuntimeError(f"无法获取独占锁: {self.resource_name}")

                def __exit__(self, exc_type, exc_val, exc_tb):
                    self.release()
            ---
    b.高级独占锁
        a.超时机制
            支持自定义超时时间,避免无限等待。
        b.锁升级
            支持从共享锁升级到独占锁。
        c.死锁检测
            实现死锁检测和自动恢复机制。
        d.代码示例
            ---
            import threading
            import queue

            class AdvancedExclusiveLock:
                def __init__(self, resource_name: str, timeout: float = 30.0):
                    self.resource_name = resource_name
                    self.timeout = timeout
                    self.lock = threading.Lock()
                    self.owner = None
                    self.acquire_count = 0
                    self.waiting_queue = queue.Queue()
                    self.lock_info = {}

                def acquire(self, requester_id: str, blocking: bool = True) -> bool:
                    """获取独占锁"""
                    start_time = time.time()

                    with self.lock:
                        if self.owner == requester_id:
                            # 重入锁
                            self.acquire_count += 1
                            logger.info(f"重入独占锁: {self.resource_name} by {requester_id}")
                            return True

                        if self.owner is None:
                            # 直接获取锁
                            self.owner = requester_id
                            self.acquire_count = 1
                            self.lock_info = {
                                'owner': requester_id,
                                'acquire_time': datetime.now().isoformat(),
                                'thread_id': threading.current_thread().ident
                            }
                            logger.info(f"独占锁获取成功: {self.resource_name} by {requester_id}")
                            return True

                    # 需要等待
                    if not blocking:
                        return False

                    # 将请求者加入等待队列
                    self.waiting_queue.put(requester_id)

                    while time.time() - start_time < self.timeout:
                        with self.lock:
                            if self.owner == requester_id:
                                self.acquire_count += 1
                                logger.info(f"等待后获取独占锁: {self.resource_name} by {requester_id}")
                                return True

                        time.sleep(0.1)

                    # 超时,从等待队列移除
                    self._remove_from_queue(requester_id)
                    logger.warning(f"独占锁获取超时: {self.resource_name} by {requester_id}")
                    return False

                def try_acquire(self, requester_id: str) -> bool:
                    """尝试获取独占锁"""
                    return self.acquire(requester_id, blocking=False)

                def release(self, requester_id: str) -> bool:
                    """释放独占锁"""
                    with self.lock:
                        if self.owner != requester_id:
                            logger.warning(f"尝试释放不持有的锁: {self.resource_name} by {requester_id}")
                            return False

                        self.acquire_count -= 1

                        if self.acquire_count == 0:
                            # 完全释放锁
                            self.owner = None
                            self.lock_info.clear()

                            # 唤醒下一个等待者
                            if not self.waiting_queue.empty():
                                next_owner = self.waiting_queue.get()
                                self.owner = next_owner
                                self.acquire_count = 1
                                self.lock_info = {
                                    'owner': next_owner,
                                    'acquire_time': datetime.now().isoformat(),
                                    'thread_id': threading.current_thread().ident
                                }
                                logger.info(f"锁传递给下一个等待者: {self.resource_name} by {next_owner}")

                            logger.info(f"独占锁释放成功: {self.resource_name} by {requester_id}")
                        else:
                            logger.info(f"减少重入计数: {self.resource_name} by {requester_id} (count: {self.acquire_count})")

                        return True

                def _remove_from_queue(self, requester_id: str):
                    """从等待队列中移除请求者"""
                    temp_queue = queue.Queue()
                    while not self.waiting_queue.empty():
                        item = self.waiting_queue.get()
                        if item != requester_id:
                            temp_queue.put(item)
                    self.waiting_queue = temp_queue

                def get_status(self) -> Dict[str, Any]:
                    """获取锁状态"""
                    with self.lock:
                        waiting_list = []
                        temp_queue = queue.Queue()
                        while not self.waiting_queue.empty():
                            item = self.waiting_queue.get()
                            waiting_list.append(item)
                            temp_queue.put(item)
                        self.waiting_queue = temp_queue

                        return {
                            'resource': self.resource_name,
                            'owner': self.owner,
                            'acquire_count': self.acquire_count,
                            'waiting_count': len(waiting_list),
                            'waiting_list': waiting_list,
                            'lock_info': self.lock_info.copy()
                        }
            ---

03.共享锁实现
    a.基础共享锁
        a.读锁机制
            允许多个读者同时访问,阻止写者进入。
        b.计数管理
            维护当前读者的数量,读者全部退出后允许写者进入。
        c.代码示例
            ---
            class SharedLock:
                def __init__(self, resource_name: str):
                    self.resource_name = resource_name
                    self.readers_count = 0
                    self.writer_lock = ExclusiveLock(f"{resource_name}_writer")
                    self.readers_lock = threading.Lock()

                def acquire_shared(self, requester_id: str, blocking: bool = True) -> bool:
                    """获取共享锁(读锁)"""
                    # 首先获取写者锁的排斥(确保没有写者)
                    if not self.writer_lock.acquire(blocking=blocking):
                        return False

                    try:
                        # 获取读者锁
                        with self.readers_lock:
                            self.readers_count += 1
                            logger.info(f"共享锁获取成功: {self.resource_name} by {requester_id} (读者数: {self.readers_count})")
                            return True
                    except Exception:
                        # 发生异常时释放写者锁
                        self.writer_lock.release()
                        raise

                def try_acquire_shared(self, requester_id: str) -> bool:
                    """尝试获取共享锁(非阻塞)"""
                    return self.acquire_shared(requester_id, blocking=False)

                def release_shared(self, requester_id: str) -> bool:
                    """释放共享锁"""
                    with self.readers_lock:
                        if self.readers_count == 0:
                            logger.warning(f"尝试释放未持有的共享锁: {self.resource_name} by {requester_id}")
                            return False

                        self.readers_count -= 1
                        logger.info(f"共享锁释放: {self.resource_name} by {requester_id} (读者数: {self.readers_count})")

                        # 如果没有读者了,释放写者锁
                        if self.readers_count == 0:
                            self.writer_lock.release()
                            logger.info(f"所有读者离开,释放写者锁: {self.resource_name}")

                        return True

                def get_readers_count(self) -> int:
                    """获取当前读者数量"""
                    with self.readers_lock:
                        return self.readers_count

                def get_status(self) -> Dict[str, Any]:
                    """获取共享锁状态"""
                    with self.readers_lock:
                        return {
                            'resource': self.resource_name,
                            'readers_count': self.readers_count,
                            'writer_locked': self.writer_lock.get_lock_info() is not None
                        }
            ---
    b.读写锁综合实现
        a.读写互斥
            读者和写者之间的互斥控制。
        b.写者优先策略
            避免写者饥饿的优先级策略。
        c.公平性算法
            实现公平的锁分配算法。
        d.代码示例
            ---
            class ReadWriteLock:
                def __init__(self, resource_name: str, writer_priority: bool = True):
                    self.resource_name = resource_name
                    self.writer_priority = writer_priority

                    # 锁状态
                    self.readers_count = 0
                    self.writers_waiting = 0
                    self.active_writer = None

                    # 同步原语
                    self.readers_lock = threading.Lock()
                    self.writers_lock = threading.Lock()
                    self.readers_can_enter = threading.Condition(self.readers_lock)
                    self.writer_can_enter = threading.Condition(self.writers_lock)

                def acquire_read(self, requester_id: str, blocking: bool = True) -> bool:
                    """获取读锁"""
                    with self.writers_lock:
                        # 如果有写者优先策略,且有写者在等待,读者需要等待
                        while self.writer_priority and (self.writers_waiting > 0 or self.active_writer is not None):
                            if not blocking:
                                return False
                            self.writers_lock.wait()

                    # 获取读者锁
                    with self.readers_lock:
                        # 等待没有活跃的写者
                        while self.active_writer is not None:
                            if not blocking:
                                # 释放写者锁
                                with self.writers_lock:
                                    self.writers_lock.notifyAll()
                                return False
                            self.readers_can_enter.wait()

                        self.readers_count += 1
                        logger.info(f"读锁获取成功: {self.resource_name} by {requester_id} (读者数: {self.readers_count})")
                        return True

                def release_read(self, requester_id: str) -> bool:
                    """释放读锁"""
                    with self.readers_lock:
                        if self.readers_count == 0:
                            logger.warning(f"尝试释放未持有的读锁: {self.resource_name} by {requester_id}")
                            return False

                        self.readers_count -= 1
                        logger.info(f"读锁释放: {self.resource_name} by {requester_id} (读者数: {self.readers_count})")

                        # 如果没有读者了,通知等待的写者
                        if self.readers_count == 0:
                            self.readers_can_enter.notify()

                        return True

                def acquire_write(self, requester_id: str, blocking: bool = True) -> bool:
                    """获取写锁"""
                    with self.writers_lock:
                        self.writers_waiting += 1

                        try:
                            # 等待没有活跃的写者
                            while self.active_writer is not None:
                                if not blocking:
                                    return False
                                self.writer_can_enter.wait()

                            # 获取读者锁
                            with self.readers_lock:
                                # 等待所有读者离开
                                while self.readers_count > 0:
                                    if not blocking:
                                        return False
                                    self.readers_can_enter.wait()

                                # 成为活跃的写者
                                self.active_writer = requester_id
                                logger.info(f"写锁获取成功: {self.resource_name} by {requester_id}")
                                return True

                        finally:
                            self.writers_waiting -= 1

                def release_write(self, requester_id: str) -> bool:
                    """释放写锁"""
                    with self.writers_lock:
                        if self.active_writer != requester_id:
                            logger.warning(f"尝试释放不持有的写锁: {self.resource_name} by {requester_id}")
                            return False

                        self.active_writer = None
                        self.writer_can_enter.notify()

                        # 通知等待的读者
                        with self.readers_lock:
                            self.readers_can_enter.notifyAll()

                        logger.info(f"写锁释放成功: {self.resource_name} by {requester_id}")
                        return True

                def get_status(self) -> Dict[str, Any]:
                    """获取读写锁状态"""
                    with self.readers_lock, self.writers_lock:
                        return {
                            'resource': self.resource_name,
                            'readers_count': self.readers_count,
                            'writers_waiting': self.writers_waiting,
                            'active_writer': self.active_writer,
                            'writer_priority': self.writer_priority
                        }
            ---

04.实际应用场景
    a.缓存系统
        a.场景描述
            多进程环境下缓存的读写访问控制。
        b.实现方案
            使用读写锁实现缓存的高效并发访问。
        c.代码示例
            ---
            import threading
            from typing import Dict, Any, Optional
            import time

            class ConcurrentCache:
                def __init__(self, max_size: int = 1000, ttl_seconds: float = 300):
                    self.max_size = max_size
                    self.ttl_seconds = ttl_seconds
                    self.cache: Dict[str, Dict[str, Any]] = {}
                    self.access_order: Dict[str, float] = {}
                    self.rw_lock = ReadWriteLock("cache", writer_priority=True)

                def get(self, key: str) -> Optional[Any]:
                    """获取缓存值"""
                    if not self.rw_lock.acquire_read(f"get_{key}"):
                        return None

                    try:
                        if key in self.cache:
                            entry = self.cache[key]

                            # 检查TTL
                            if time.time() - entry['timestamp'] > self.ttl_seconds:
                                return None

                            # 更新访问时间
                            self.access_order[key] = time.time()
                            return entry['value']

                        return None
                    finally:
                        self.rw_lock.release_read(f"get_{key}")

                def put(self, key: str, value: Any) -> bool:
                    """设置缓存值"""
                    if not self.rw_lock.acquire_write(f"put_{key}"):
                        return False

                    try:
                        # 检查容量
                        if len(self.cache) >= self.max_size and key not in self.cache:
                            self._evict_lru()

                        # 添加或更新缓存
                        self.cache[key] = {
                            'value': value,
                            'timestamp': time.time()
                        }
                        self.access_order[key] = time.time()

                        return True
                    finally:
                        self.rw_lock.release_write(f"put_{key}")

                def delete(self, key: str) -> bool:
                    """删除缓存值"""
                    if not self.rw_lock.acquire_write(f"delete_{key}"):
                        return False

                    try:
                        if key in self.cache:
                            del self.cache[key]
                            del self.access_order[key]
                            return True
                        return False
                    finally:
                        self.rw_lock.release_write(f"delete_{key}")

                def clear(self) -> bool:
                    """清空缓存"""
                    if not self.rw_lock.acquire_write("clear"):
                        return False

                    try:
                        self.cache.clear()
                        self.access_order.clear()
                        return True
                    finally:
                        self.rw_lock.release_write("clear")

                def _evict_lru(self):
                    """淘汰最近最少使用的缓存"""
                    if not self.access_order:
                        return

                    # 找到最久未访问的键
                    lru_key = min(self.access_order.keys(), key=lambda k: self.access_order[k])

                    del self.cache[lru_key]
                    del self.access_order[lru_key]

                def get_stats(self) -> Dict[str, Any]:
                    """获取缓存统计信息"""
                    if not self.rw_lock.acquire_read("stats"):
                        return {}

                    try:
                        return {
                            'size': len(self.cache),
                            'max_size': self.max_size,
                            'hit_rate': getattr(self, '_hit_count', 0) / max(getattr(self, '_total_requests', 1), 1),
                            'lock_status': self.rw_lock.get_status()
                        }
                    finally:
                        self.rw_lock.release_read("stats")
            ---
    b.数据库连接池
        a.场景描述
            多进程共享数据库连接池的并发控制。
        b.实现方案
            使用独占锁和共享锁管理连接的分配和回收。
        c.代码示例
            ---
            import threading
            from typing import List, Optional, Dict, Any
            import queue
            import time

            class DatabaseConnectionPool:
                def __init__(self, connection_factory, max_connections: int = 10):
                    self.connection_factory = connection_factory
                    self.max_connections = max_connections

                    # 连接管理
                    self.available_connections: queue.Queue = queue.Queue(maxsize=max_connections)
                    self.active_connections: Dict[str, Any] = {}

                    # 锁管理
                    self.pool_lock = ExclusiveLock("connection_pool")
                    self.connection_locks: Dict[str, ExclusiveLock] = {}

                    # 统计信息
                    self.stats = {
                        'total_requests': 0,
                        'active_connections': 0,
                        'pool_hits': 0,
                        'pool_misses': 0
                    }

                    # 初始化连接池
                    self._initialize_pool()

                def _initialize_pool(self):
                    """初始化连接池"""
                    for i in range(self.max_connections):
                        try:
                            conn = self.connection_factory()
                            connection_id = f"conn_{i}"
                            self.available_connections.put({
                                'id': connection_id,
                                'connection': conn,
                                'created_time': time.time(),
                                'last_used': time.time()
                            })
                        except Exception as e:
                            logger.error(f"创建连接失败: {e}")

                def get_connection(self, requester_id: str, timeout: float = 30.0) -> Optional[str]:
                    """获取数据库连接"""
                    self.stats['total_requests'] += 1

                    if not self.pool_lock.acquire("get_connection", blocking=True):
                        return None

                    try:
                        # 检查是否有可用连接
                        if self.available_connections.empty():
                            self.stats['pool_misses'] += 1
                            logger.warning(f"连接池已空,请求者 {requester_id} 等待连接")
                            return None

                        # 获取连接
                        conn_info = self.available_connections.get(timeout=timeout)
                        connection_id = conn_info['id']

                        # 更新连接信息
                        conn_info['last_used'] = time.time()
                        conn_info['borrower'] = requester_id
                        conn_info['borrow_time'] = time.time()

                        # 记录活跃连接
                        self.active_connections[connection_id] = conn_info
                        self.stats['active_connections'] = len(self.active_connections)
                        self.stats['pool_hits'] += 1

                        # 为连接创建独占锁
                        self.connection_locks[connection_id] = ExclusiveLock(f"conn_{connection_id}")

                        logger.info(f"分配连接 {connection_id} 给 {requester_id}")
                        return connection_id

                    except queue.Empty:
                        self.stats['pool_misses'] += 1
                        logger.warning(f"获取连接超时: {requester_id}")
                        return None
                    finally:
                        self.pool_lock.release("get_connection")

                def release_connection(self, connection_id: str, requester_id: str) -> bool:
                    """释放数据库连接"""
                    if not self.pool_lock.acquire("release_connection"):
                        return False

                    try:
                        if connection_id not in self.active_connections:
                            logger.warning(f"尝试释放不存在的连接: {connection_id}")
                            return False

                        conn_info = self.active_connections[connection_id]

                        # 验证请求者
                        if conn_info['borrower'] != requester_id:
                            logger.warning(f"尝试释放不属于自己的连接: {connection_id} by {requester_id}")
                            return False

                        # 释放连接锁
                        if connection_id in self.connection_locks:
                            self.connection_locks[connection_id].release()
                            del self.connection_locks[connection_id]

                        # 更新连接信息
                        del conn_info['borrower']
                        del conn_info['borrow_time']
                        conn_info['last_used'] = time.time()

                        # 归还到连接池
                        self.available_connections.put(conn_info)
                        del self.active_connections[connection_id]
                        self.stats['active_connections'] = len(self.active_connections)

                        logger.info(f"回收连接 {connection_id} 从 {requester_id}")
                        return True

                    finally:
                        self.pool_lock.release("release_connection")

                def execute_with_connection(self, requester_id: str, operation, timeout: float = 30.0):
                    """使用连接执行操作"""
                    connection_id = self.get_connection(requester_id, timeout)
                    if not connection_id:
                        raise RuntimeError(f"无法获取数据库连接: {requester_id}")

                    try:
                        # 获取连接的独占锁
                        if connection_id in self.connection_locks:
                            if not self.connection_locks[connection_id].acquire():
                                raise RuntimeError(f"无法锁定连接: {connection_id}")

                            try:
                                conn_info = self.active_connections[connection_id]
                                return operation(conn_info['connection'])
                            finally:
                                self.connection_locks[connection_id].release()
                        else:
                            conn_info = self.active_connections[connection_id]
                            return operation(conn_info['connection'])
                    finally:
                        self.release_connection(connection_id, requester_id)

                def get_pool_status(self) -> Dict[str, Any]:
                    """获取连接池状态"""
                    if not self.pool_lock.try_acquire("status"):
                        return {'status': 'locked'}

                    try:
                        return {
                            'max_connections': self.max_connections,
                            'available_connections': self.available_connections.qsize(),
                            'active_connections': len(self.active_connections),
                            'stats': self.stats.copy(),
                            'connection_locks': len(self.connection_locks)
                        }
                    finally:
                        self.pool_lock.release("status")
            ---
    c.文件配置管理
        a.场景描述
            多进程环境下配置文件的读写控制。
        b.实现方案
            使用读写锁确保配置文件的安全访问。
        c.代码示例
            ---
            import json
            import threading
            from typing import Dict, Any, Optional
            import os

            class ConfigFileManager:
                def __init__(self, config_file: str):
                    self.config_file = config_file
                    self.rw_lock = ReadWriteLock("config_file")
                    self.config_cache: Optional[Dict[str, Any]] = None
                    self.cache_timestamp: Optional[float] = None
                    self.cache_timeout: float = 60.0  # 缓存1分钟

                def read_config(self, requester_id: str, use_cache: bool = True) -> Optional[Dict[str, Any]]:
                    """读取配置文件"""
                    current_time = time.time()

                    # 检查缓存
                    if (use_cache and self.config_cache is not None and
                        self.cache_timestamp is not None and
                        current_time - self.cache_timestamp < self.cache_timeout):

                        logger.info(f"从缓存读取配置: {requester_id}")
                        return self.config_cache.copy()

                    # 获取读锁
                    if not self.rw_lock.acquire_read(f"read_{requester_id}"):
                        logger.error(f"无法获取配置文件读锁: {requester_id}")
                        return None

                    try:
                        # 再次检查缓存(双重检查)
                        if (use_cache and self.config_cache is not None and
                            self.cache_timestamp is not None and
                            current_time - self.cache_timestamp < self.cache_timeout):

                            return self.config_cache.copy()

                        # 从文件读取
                        if not os.path.exists(self.config_file):
                            logger.warning(f"配置文件不存在: {self.config_file}")
                            return {}

                        with open(self.config_file, 'r', encoding='utf-8') as f:
                            config = json.load(f)

                        # 更新缓存
                        self.config_cache = config.copy()
                        self.cache_timestamp = current_time

                        logger.info(f"从文件读取配置: {requester_id}, 项目数: {len(config)}")
                        return config.copy()

                    except json.JSONDecodeError as e:
                        logger.error(f"配置文件JSON解析失败: {e}")
                        return None
                    except Exception as e:
                        logger.error(f"读取配置文件失败: {e}")
                        return None
                    finally:
                        self.rw_lock.release_read(f"read_{requester_id}")

                def write_config(self, updates: Dict[str, Any], requester_id: str) -> bool:
                    """写入配置文件"""
                    # 获取写锁
                    if not self.rw_lock.acquire_write(f"write_{requester_id}"):
                        logger.error(f"无法获取配置文件写锁: {requester_id}")
                        return False

                    try:
                        # 读取当前配置
                        current_config = {}
                        if os.path.exists(self.config_file):
                            with open(self.config_file, 'r', encoding='utf-8') as f:
                                current_config = json.load(f)

                        # 备份当前配置
                        backup_file = f"{self.config_file}.backup.{int(time.time())}"
                        with open(backup_file, 'w', encoding='utf-8') as f:
                            json.dump(current_config, f, indent=2, ensure_ascii=False)

                        # 应用更新
                        current_config.update(updates)

                        # 写入新配置
                        with open(self.config_file, 'w', encoding='utf-8') as f:
                            json.dump(current_config, f, indent=2, ensure_ascii=False)

                        # 更新缓存
                        self.config_cache = current_config.copy()
                        self.cache_timestamp = time.time()

                        logger.info(f"配置文件更新成功: {requester_id}, 更新项数: {len(updates)}, 备份: {backup_file}")
                        return True

                    except Exception as e:
                        logger.error(f"写入配置文件失败: {e}")
                        return False
                    finally:
                        self.rw_lock.release_write(f"write_{requester_id}")

                def atomic_update(self, key: str, update_func, requester_id: str) -> bool:
                    """原子更新配置项"""
                    # 获取写锁
                    if not self.rw_lock.acquire_write(f"atomic_{requester_id}"):
                        return False

                    try:
                        # 读取当前配置
                        current_config = {}
                        if os.path.exists(self.config_file):
                            with open(self.config_file, 'r', encoding='utf-8') as f:
                                current_config = json.load(f)

                        # 应用更新函数
                        old_value = current_config.get(key)
                        new_value = update_func(old_value)
                        current_config[key] = new_value

                        # 写回文件
                        with open(self.config_file, 'w', encoding='utf-8') as f:
                            json.dump(current_config, f, indent=2, ensure_ascii=False)

                        # 更新缓存
                        self.config_cache = current_config.copy()
                        self.cache_timestamp = time.time()

                        logger.info(f"原子更新配置项成功: {key}, {old_value} -> {new_value} by {requester_id}")
                        return True

                    except Exception as e:
                        logger.error(f"原子更新配置项失败: {e}")
                        return False
                    finally:
                        self.rw_lock.release_write(f"atomic_{requester_id}")

                def get_config_status(self) -> Dict[str, Any]:
                    """获取配置管理状态"""
                    return {
                        'config_file': self.config_file,
                        'cache_valid': (
                            self.config_cache is not None and
                            self.cache_timestamp is not None and
                            time.time() - self.cache_timestamp < self.cache_timeout
                        ),
                        'cache_age': (
                            time.time() - self.cache_timestamp
                            if self.cache_timestamp else None
                        ),
                        'lock_status': self.rw_lock.get_status()
                    }
            ---

05.最佳实践与性能优化
    a.锁选择策略
        a.读多写少场景
            优先使用读写锁,提高并发性能。
        b.写多读少场景
            简化使用独占锁,降低复杂性。
        c.读写均衡场景
            根据具体需求选择合适的锁策略。
    b.性能优化技巧
        a.锁粒度控制
            选择合适的锁粒度,避免过度锁定。
        b.锁持有时间
            尽量缩短锁的持有时间。
        c.批量操作
            将多个小操作合并为一次锁定操作。
        d.缓存机制
            合理使用缓存减少锁竞争。
    c.调试与监控
        a.锁状态监控
            监控锁的获取、等待和释放情况。
        b.性能分析
            分析锁竞争对系统性能的影响。
        c.死锁预防
            建立统一的锁获取顺序,避免死锁。
        d.日志记录
            详细记录锁操作日志,便于问题诊断。

8.4 文件锁应用场景

01.多进程数据处理
    a.场景描述
        多个进程同时处理大量数据文件时的并发控制和数据完整性保护。
    b.核心挑战
        a.数据竞争
            多个进程同时修改同一数据文件可能导致数据损坏。
        b.进程同步
            需要协调不同进程的处理顺序和进度。
        c.资源管理
            有效管理和分配有限的文件资源。
    c.解决方案
        a.文件级锁定
            使用独占锁保护整个文件的访问。
        b.区域级锁定
            锁定文件的特定区域,提高并发性能。
        c.分布式协调
            通过锁文件实现进程间的协调通信。
    d.代码示例
        ---
        # 多进程数据处理文件锁应用示例
        import multiprocessing
        import os
        import time
        import logging
        import json
        from datetime import datetime
        from typing import Dict, Any, List, Optional

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class DataProcessingCoordinator:
            def __init__(self, data_file: str, num_processes: int = 4):
                self.data_file = data_file
                self.num_processes = num_processes
                self.progress_file = f"{data_file}.progress"
                self.lock_file = f"{data_file}.lock"
                self.status_file = f"{data_file}.status"

            def initialize_data_file(self, total_records: int):
                """初始化数据文件"""
                data = {
                    'total_records': total_records,
                    'processed_records': 0,
                    'records': [f'record_{i}' for i in range(total_records)],
                    'results': {},
                    'created_time': datetime.now().isoformat()
                }

                with open(self.data_file, 'w') as f:
                    json.dump(data, f, indent=2)

                logger.info(f"数据文件初始化完成: {total_records} 条记录")

            def acquire_processing_lock(self, process_id: str, timeout: float = 30.0) -> bool:
                """获取处理锁"""
                start_time = time.time()

                while time.time() - start_time < timeout:
                    try:
                        # 尝试创建锁文件
                        fd = os.open(self.lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)

                        # 写入锁信息
                        lock_info = {
                            'process_id': process_id,
                            'acquire_time': datetime.now().isoformat(),
                            'lock_type': 'processing'
                        }

                        with os.fdopen(fd, 'w') as f:
                            json.dump(lock_info, f)

                        logger.info(f"进程 {process_id} 获取处理锁成功")
                        return True

                    except OSError as e:
                        if e.errno == 17:  # 文件已存在
                            # 检查锁是否过期
                            if self._is_lock_stale():
                                self._cleanup_stale_lock()
                                continue

                            time.sleep(0.1)
                        else:
                            logger.error(f"创建锁文件失败: {e}")
                            return False

                logger.warning(f"进程 {process_id} 获取处理锁超时")
                return False

            def release_processing_lock(self, process_id: str) -> bool:
                """释放处理锁"""
                try:
                    if os.path.exists(self.lock_file):
                        with open(self.lock_file, 'r') as f:
                            lock_info = json.load(f)

                        if lock_info['process_id'] == process_id:
                            os.remove(self.lock_file)
                            logger.info(f"进程 {process_id} 释放处理锁成功")
                            return True
                        else:
                            logger.warning(f"进程 {process_id} 尝试释放不持有的锁")
                            return False
                    return True
                except Exception as e:
                    logger.error(f"释放处理锁失败: {e}")
                    return False

            def process_batch(self, process_id: str, batch_size: int = 10) -> Dict[str, Any]:
                """处理一批数据"""
                if not self.acquire_processing_lock(process_id):
                    return {'success': False, 'error': '无法获取处理锁'}

                try:
                    # 读取数据文件
                    with open(self.data_file, 'r') as f:
                        data = json.load(f)

                    # 找到未处理的记录
                    unprocessed_records = []
                    for record in data['records']:
                        if record not in data['results']:
                            unprocessed_records.append(record)

                    if not unprocessed_records:
                        return {'success': True, 'processed': 0, 'message': '所有记录已处理完成'}

                    # 处理一批记录
                    batch = unprocessed_records[:batch_size]
                    processed_count = 0

                    for record in batch:
                        # 模拟数据处理
                        result = self._process_single_record(record, process_id)
                        data['results'][record] = result
                        processed_count += 1

                        # 更新进度
                        data['processed_records'] = len(data['results'])
                        self._update_progress(data)

                    # 写回数据文件
                    with open(self.data_file, 'w') as f:
                        json.dump(data, f, indent=2)

                    logger.info(f"进程 {process_id} 处理了 {processed_count} 条记录")
                    return {
                        'success': True,
                        'processed': processed_count,
                        'remaining': len(unprocessed_records) - processed_count
                    }

                finally:
                    self.release_processing_lock(process_id)

            def _process_single_record(self, record: str, process_id: str) -> Dict[str, Any]:
                """处理单条记录"""
                # 模拟数据处理时间
                time.sleep(0.1)

                return {
                    'processed_by': process_id,
                    'process_time': datetime.now().isoformat(),
                    'result': f"processed_{record}",
                    'status': 'completed'
                }

            def _update_progress(self, data: Dict[str, Any]):
                """更新处理进度"""
                progress = {
                    'total_records': data['total_records'],
                    'processed_records': data['processed_records'],
                    'completion_rate': data['processed_records'] / data['total_records'] * 100,
                    'last_update': datetime.now().isoformat()
                }

                with open(self.progress_file, 'w') as f:
                    json.dump(progress, f, indent=2)

            def _is_lock_stale(self, max_age_seconds: float = 60.0) -> bool:
                """检查锁是否过期"""
                try:
                    stat = os.stat(self.lock_file)
                    age = time.time() - stat.st_mtime
                    return age > max_age_seconds
                except OSError:
                    return True

            def _cleanup_stale_lock(self):
                """清理过期锁"""
                try:
                    os.remove(self.lock_file)
                    logger.info("清理过期处理锁")
                except OSError as e:
                    logger.error(f"清理过期锁失败: {e}")

            def get_processing_status(self) -> Dict[str, Any]:
                """获取处理状态"""
                try:
                    with open(self.data_file, 'r') as f:
                        data = json.load(f)

                    return {
                        'total_records': data['total_records'],
                        'processed_records': data['processed_records'],
                        'completion_rate': data['processed_records'] / data['total_records'] * 100,
                        'remaining_records': data['total_records'] - data['processed_records'],
                        'lock_status': os.path.exists(self.lock_file)
                    }
                except Exception as e:
                    logger.error(f"获取处理状态失败: {e}")
                    return {'error': str(e)}

        def data_processing_worker(process_id: str, coordinator: DataProcessingCoordinator, max_iterations: int = 50):
            """数据处理工作进程"""
            logger.info(f"数据处理进程 {process_id} 启动")

            for i in range(max_iterations):
                result = coordinator.process_batch(process_id)

                if not result['success']:
                    logger.error(f"进程 {process_id} 处理失败: {result.get('error', '未知错误')}")
                    break

                if result.get('processed', 0) == 0:
                    logger.info(f"进程 {process_id} 完成所有工作")
                    break

                # 短暂休息
                time.sleep(0.5)

            logger.info(f"数据处理进程 {process_id} 结束")

        def run_multi_process_data_processing():
            """运行多进程数据处理示例"""
            data_file = "processing_data.json"
            coordinator = DataProcessingCoordinator(data_file, num_processes=4)

            # 初始化数据文件
            coordinator.initialize_data_file(100)

            # 创建进程池
            processes = []
            for i in range(4):
                p = multiprocessing.Process(
                    target=data_processing_worker,
                    args=(f"processor_{i}", coordinator)
                )
                processes.append(p)
                p.start()

            # 等待所有进程完成
            for p in processes:
                p.join()

            # 输出最终状态
            final_status = coordinator.get_processing_status()
            logger.info(f"数据处理完成: {final_status}")

            # 清理文件
            for file_path in [data_file, coordinator.progress_file, coordinator.lock_file]:
                if os.path.exists(file_path):
                    os.remove(file_path)
        ---

02.分布式任务队列
    a.场景描述
        多个worker进程从共享的任务队列中获取任务并执行,需要确保任务不被重复处理。
    b.核心需求
        a.任务去重
            确保每个任务只被一个worker处理。
        b.负载均衡
            合理分配任务给不同的worker进程。
        c.故障恢复
            处理worker进程崩溃或超时的情况。
    c.实现方案
        a.基于文件的任务队列
            使用文件作为任务队列和状态跟踪。
        b.分布式锁机制
            通过文件锁实现任务的互斥访问。
        c.心跳检测
            定期检查worker进程的健康状态。
    d.代码示例
        ---
        import threading
        import queue
        import hashlib

        class DistributedTaskQueue:
            def __init__(self, queue_dir: str = "task_queue"):
                self.queue_dir = queue_dir
                self.pending_dir = os.path.join(queue_dir, "pending")
                self.processing_dir = os.path.join(queue_dir, "processing")
                self.completed_dir = os.path.join(queue_dir, "completed")
                self.failed_dir = os.path.join(queue_dir, "failed")
                self.workers_file = os.path.join(queue_dir, "workers.json")

                # 创建目录结构
                os.makedirs(self.pending_dir, exist_ok=True)
                os.makedirs(self.processing_dir, exist_ok=True)
                os.makedirs(self.completed_dir, exist_ok=True)
                os.makedirs(self.failed_dir, exist_ok=True)

                # 初始化worker注册表
                self._init_workers_registry()

            def _init_workers_registry(self):
                """初始化worker注册表"""
                if not os.path.exists(self.workers_file):
                    with open(self.workers_file, 'w') as f:
                        json.dump({}, f)

            def register_worker(self, worker_id: str, worker_info: Dict[str, Any]) -> bool:
                """注册worker"""
                if not self._acquire_workers_lock():
                    return False

                try:
                    with open(self.workers_file, 'r+') as f:
                        workers = json.load(f)

                        workers[worker_id] = {
                            **worker_info,
                            'register_time': datetime.now().isoformat(),
                            'last_heartbeat': datetime.now().isoformat(),
                            'status': 'active'
                        }

                        f.seek(0)
                        json.dump(workers, f, indent=2)
                        f.truncate()

                    logger.info(f"Worker {worker_id} 注册成功")
                    return True
                finally:
                    self._release_workers_lock()

            def add_task(self, task_data: Dict[str, Any], priority: int = 0) -> str:
                """添加任务到队列"""
                # 生成任务ID
                task_id = hashlib.md5(
                    json.dumps(task_data, sort_keys=True).encode()
                ).hexdigest()

                task_file = os.path.join(self.pending_dir, f"{task_id}.json")

                task_info = {
                    'task_id': task_id,
                    'data': task_data,
                    'priority': priority,
                    'created_time': datetime.now().isoformat(),
                    'status': 'pending'
                }

                with open(task_file, 'w') as f:
                    json.dump(task_info, f, indent=2)

                logger.info(f"任务 {task_id} 已添加到队列")
                return task_id

            def get_next_task(self, worker_id: str, timeout: float = 30.0) -> Optional[Dict[str, Any]]:
                """获取下一个待处理任务"""
                start_time = time.time()

                while time.time() - start_time < timeout:
                    # 更新worker心跳
                    self._update_worker_heartbeat(worker_id)

                    # 获取待处理任务列表
                    pending_tasks = self._get_pending_tasks()

                    if not pending_tasks:
                        time.sleep(1)
                        continue

                    # 按优先级排序
                    pending_tasks.sort(key=lambda x: x['priority'], reverse=True)

                    for task_file, task_info in pending_tasks:
                        # 尝试获取任务锁
                        if self._acquire_task_lock(task_info['task_id'], worker_id):
                            # 移动任务到处理中目录
                            processing_file = os.path.join(
                                self.processing_dir,
                                f"{task_info['task_id']}.json"
                            )

                            try:
                                # 更新任务状态
                                task_info['status'] = 'processing'
                                task_info['assigned_worker'] = worker_id
                                task_info['start_time'] = datetime.now().isoformat()

                                with open(processing_file, 'w') as f:
                                    json.dump(task_info, f, indent=2)

                                # 删除待处理文件
                                os.remove(task_file)

                                logger.info(f"Worker {worker_id} 获取任务 {task_info['task_id']}")
                                return task_info

                            except Exception as e:
                                logger.error(f"移动任务文件失败: {e}")
                                self._release_task_lock(task_info['task_id'])

                    time.sleep(0.5)

                logger.warning(f"Worker {worker_id} 获取任务超时")
                return None

            def complete_task(self, task_id: str, worker_id: str, result: Dict[str, Any]) -> bool:
                """完成任务"""
                task_file = os.path.join(self.processing_dir, f"{task_id}.json")

                if not os.path.exists(task_file):
                    logger.warning(f"任务文件不存在: {task_id}")
                    return False

                try:
                    # 读取任务信息
                    with open(task_file, 'r') as f:
                        task_info = json.load(f)

                    # 验证worker
                    if task_info.get('assigned_worker') != worker_id:
                        logger.warning(f"Worker {worker_id} 尝试完成不属于自己的任务: {task_id}")
                        return False

                    # 更新任务状态
                    task_info['status'] = 'completed'
                    task_info['completion_time'] = datetime.now().isoformat()
                    task_info['result'] = result

                    # 移动到完成目录
                    completed_file = os.path.join(self.completed_dir, f"{task_id}.json")
                    with open(completed_file, 'w') as f:
                        json.dump(task_info, f, indent=2)

                    # 删除处理中的文件
                    os.remove(task_file)

                    # 释放任务锁
                    self._release_task_lock(task_id)

                    logger.info(f"任务 {task_id} 完成,由 {worker_id} 处理")
                    return True

                except Exception as e:
                    logger.error(f"完成任务失败: {e}")
                    return False

            def fail_task(self, task_id: str, worker_id: str, error: str) -> bool:
                """标记任务失败"""
                task_file = os.path.join(self.processing_dir, f"{task_id}.json")

                if not os.path.exists(task_file):
                    logger.warning(f"任务文件不存在: {task_id}")
                    return False

                try:
                    with open(task_file, 'r') as f:
                        task_info = json.load(f)

                    if task_info.get('assigned_worker') != worker_id:
                        logger.warning(f"Worker {worker_id} 尝试失败不属于自己的任务: {task_id}")
                        return False

                    task_info['status'] = 'failed'
                    task_info['failure_time'] = datetime.now().isoformat()
                    task_info['error'] = error
                    task_info['retry_count'] = task_info.get('retry_count', 0) + 1

                    # 如果重试次数少于限制,重新放回队列
                    if task_info['retry_count'] < 3:
                        pending_file = os.path.join(self.pending_dir, f"{task_id}.json")
                        task_info['status'] = 'pending'
                        del task_info['assigned_worker'], task_info['start_time']

                        with open(pending_file, 'w') as f:
                            json.dump(task_info, f, indent=2)

                        logger.info(f"任务 {task_id} 重新放回队列,重试次数: {task_info['retry_count']}")
                    else:
                        # 移动到失败目录
                        failed_file = os.path.join(self.failed_dir, f"{task_id}.json")
                        with open(failed_file, 'w') as f:
                            json.dump(task_info, f, indent=2)

                        logger.error(f"任务 {task_id} 最终失败,重试次数已达上限")

                    os.remove(task_file)
                    self._release_task_lock(task_id)
                    return True

                except Exception as e:
                    logger.error(f"标记任务失败时出错: {e}")
                    return False

            def _get_pending_tasks(self) -> List[tuple]:
                """获取待处理任务列表"""
                pending_tasks = []
                for filename in os.listdir(self.pending_dir):
                    if filename.endswith('.json'):
                        task_file = os.path.join(self.pending_dir, filename)
                        try:
                            with open(task_file, 'r') as f:
                                task_info = json.load(f)
                            pending_tasks.append((task_file, task_info))
                        except Exception as e:
                            logger.error(f"读取任务文件失败 {filename}: {e}")

                return pending_tasks

            def _acquire_workers_lock(self) -> bool:
                """获取workers锁"""
                lock_file = f"{self.workers_file}.lock"
                try:
                    fd = os.open(lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)
                    os.close(fd)
                    return True
                except OSError:
                    return False

            def _release_workers_lock(self):
                """释放workers锁"""
                lock_file = f"{self.workers_file}.lock"
                try:
                    os.remove(lock_file)
                except OSError:
                    pass

            def _acquire_task_lock(self, task_id: str, worker_id: str) -> bool:
                """获取任务锁"""
                lock_file = os.path.join(self.processing_dir, f"{task_id}.lock")
                try:
                    fd = os.open(lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)

                    lock_info = {
                        'task_id': task_id,
                        'worker_id': worker_id,
                        'acquire_time': datetime.now().isoformat()
                    }

                    with os.fdopen(fd, 'w') as f:
                        json.dump(lock_info, f)

                    return True
                except OSError:
                    return False

            def _release_task_lock(self, task_id: str):
                """释放任务锁"""
                lock_file = os.path.join(self.processing_dir, f"{task_id}.lock")
                try:
                    os.remove(lock_file)
                except OSError:
                    pass

            def _update_worker_heartbeat(self, worker_id: str):
                """更新worker心跳"""
                if self._acquire_workers_lock():
                    try:
                        with open(self.workers_file, 'r+') as f:
                            workers = json.load(f)

                            if worker_id in workers:
                                workers[worker_id]['last_heartbeat'] = datetime.now().isoformat()

                                f.seek(0)
                                json.dump(workers, f, indent=2)
                                f.truncate()
                    finally:
                        self._release_workers_lock()

            def get_queue_status(self) -> Dict[str, Any]:
                """获取队列状态"""
                status = {
                    'pending_tasks': len([f for f in os.listdir(self.pending_dir) if f.endswith('.json')]),
                    'processing_tasks': len([f for f in os.listdir(self.processing_dir) if f.endswith('.json')]),
                    'completed_tasks': len([f for f in os.listdir(self.completed_dir) if f.endswith('.json')]),
                    'failed_tasks': len([f for f in os.listdir(self.failed_dir) if f.endswith('.json')])
                }

                # 获取worker信息
                if self._acquire_workers_lock():
                    try:
                        with open(self.workers_file, 'r') as f:
                            workers = json.load(f)
                        status['active_workers'] = len([w for w in workers.values() if w.get('status') == 'active'])
                        status['workers'] = workers
                    finally:
                        self._release_workers_lock()

                return status

        def task_worker(worker_id: str, task_queue: DistributedTaskQueue, max_tasks: int = 10):
            """任务处理worker"""
            logger.info(f"Worker {worker_id} 启动")

            # 注册worker
            worker_info = {
                'pid': os.getpid(),
                'hostname': os.uname()[1] if hasattr(os, 'uname') else 'unknown',
                'max_tasks': max_tasks
            }

            if not task_queue.register_worker(worker_id, worker_info):
                logger.error(f"Worker {worker_id} 注册失败")
                return

            processed_tasks = 0

            while processed_tasks < max_tasks:
                # 获取任务
                task = task_queue.get_next_task(worker_id, timeout=10)

                if task is None:
                    logger.info(f"Worker {worker_id} 没有更多任务")
                    break

                try:
                    # 处理任务
                    logger.info(f"Worker {worker_id} 开始处理任务 {task['task_id']}")
                    result = process_task(task['data'])

                    # 完成任务
                    if task_queue.complete_task(task['task_id'], worker_id, result):
                        processed_tasks += 1
                        logger.info(f"Worker {worker_id} 完成任务 {task['task_id']}")
                    else:
                        logger.error(f"Worker {worker_id} 完成任务失败 {task['task_id']}")

                except Exception as e:
                    # 任务失败
                    error_msg = f"任务处理失败: {str(e)}"
                    if task_queue.fail_task(task['task_id'], worker_id, error_msg):
                        logger.warning(f"任务 {task['task_id']} 标记为失败: {error_msg}")
                    else:
                        logger.error(f"无法标记任务 {task['task_id']} 失败")

                time.sleep(0.1)  # 短暂休息

            logger.info(f"Worker {worker_id} 结束,处理了 {processed_tasks} 个任务")

        def process_task(task_data: Dict[str, Any]) -> Dict[str, Any]:
            """处理单个任务"""
            # 模拟任务处理
            time.sleep(task_data.get('duration', 1))

            return {
                'status': 'success',
                'result': f"processed_{task_data.get('type', 'unknown')}",
                'processing_time': time.time()
            }

        def run_distributed_task_queue():
            """运行分布式任务队列示例"""
            # 创建任务队列
            task_queue = DistributedTaskQueue()

            # 添加一些任务
            tasks = [
                {'type': 'data_analysis', 'duration': 2, 'priority': 3},
                {'type': 'file_processing', 'duration': 1, 'priority': 2},
                {'type': 'image_processing', 'duration': 3, 'priority': 1},
                {'type': 'report_generation', 'duration': 1.5, 'priority': 2},
                {'type': 'email_sending', 'duration': 0.5, 'priority': 1}
            ]

            for i, task in enumerate(tasks):
                task_queue.add_task(task, priority=task['priority'])

            # 启动多个worker
            workers = []
            for i in range(3):
                p = multiprocessing.Process(
                    target=task_worker,
                    args=(f"worker_{i}", task_queue, 5)
                )
                workers.append(p)
                p.start()

            # 等待所有worker完成
            for p in workers:
                p.join()

            # 输出队列状态
            final_status = task_queue.get_queue_status()
            logger.info(f"任务队列最终状态: {final_status}")
        ---

03.共享资源访问控制
    a.场景描述
        多个进程需要访问有限的共享资源(如数据库连接、网络端口、硬件设备等)。
    b.管理策略
        a.资源池化
            创建可重用的资源池,提高资源利用率。
        b.访问配额
            为不同进程设置资源访问配额限制。
        c.优先级调度
            实现基于优先级的资源分配策略。
    c.技术实现
        a.锁文件机制
            使用锁文件控制对资源的访问。
        b.计数器管理
            维护资源使用计数,实现并发控制。
        c.超时回收
            自动回收长时间未释放的资源。
    d.代码示例
        ---
        class SharedResourceManager:
            def __init__(self, resource_name: str, max_concurrent: int = 10):
                self.resource_name = resource_name
                self.max_concurrent = max_concurrent
                self.lock_file = f"/tmp/.resource_lock_{resource_name}"
                self.counter_file = f"/tmp/.resource_counter_{resource_name}"
                self.allocations_file = f"/tmp/.resource_allocations_{resource_name}"

            def acquire_resource(self, requester_id: str, timeout: float = 30.0) -> Optional[int]:
                """获取资源访问权限"""
                start_time = time.time()

                while time.time() - start_time < timeout:
                    if self._acquire_management_lock():
                        try:
                            # 读取当前计数
                            current_count = self._get_current_count()
                            allocations = self._get_allocations()

                            # 检查是否还有可用资源
                            if current_count < self.max_concurrent:
                                # 分配资源ID
                                resource_id = current_count + 1

                                # 更新计数
                                self._update_counter(resource_id)

                                # 记录分配信息
                                allocations[str(resource_id)] = {
                                    'requester_id': requester_id,
                                    'acquire_time': datetime.now().isoformat(),
                                    'resource_id': resource_id
                                }
                                self._update_allocations(allocations)

                                logger.info(f"资源 {resource_id} 分配给 {requester_id}")
                                return resource_id
                            else:
                                # 资源已用完,检查是否有超时的分配
                                self._cleanup_expired_allocations()

                        finally:
                            self._release_management_lock()

                    time.sleep(0.1)

                logger.warning(f"{requester_id} 获取资源超时")
                return None

            def release_resource(self, resource_id: int, requester_id: str) -> bool:
                """释放资源"""
                if not self._acquire_management_lock():
                    return False

                try:
                    allocations = self._get_allocations()
                    resource_key = str(resource_id)

                    if resource_key in allocations:
                        allocation = allocations[resource_key]

                        if allocation['requester_id'] == requester_id:
                            # 移除分配记录
                            del allocations[resource_key]
                            self._update_allocations(allocations)

                            logger.info(f"资源 {resource_id} 被 {requester_id} 释放")
                            return True
                        else:
                            logger.warning(f"{requester_id} 尝试释放不属于自己的资源 {resource_id}")
                            return False
                    else:
                        logger.warning(f"资源 {resource_id} 分配记录不存在")
                        return False

                finally:
                    self._release_management_lock()

            def try_acquire_resource(self, requester_id: str) -> Optional[int]:
                """尝试获取资源(非阻塞)"""
                return self.acquire_resource(requester_id, timeout=0)

            def get_resource_status(self) -> Dict[str, Any]:
                """获取资源状态"""
                if not self._acquire_management_lock():
                    return {'status': 'locked'}

                try:
                    current_count = self._get_current_count()
                    allocations = self._get_allocations()

                    return {
                        'resource_name': self.resource_name,
                        'max_concurrent': self.max_concurrent,
                        'current_count': current_count,
                        'available_count': self.max_concurrent - current_count,
                        'allocations': allocations,
                        'utilization_rate': current_count / self.max_concurrent * 100
                    }
                finally:
                    self._release_management_lock()

            def _acquire_management_lock(self) -> bool:
                """获取管理锁"""
                try:
                    fd = os.open(self.lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)
                    os.close(fd)
                    return True
                except OSError:
                    return False

            def _release_management_lock(self):
                """释放管理锁"""
                try:
                    os.remove(self.lock_file)
                except OSError:
                    pass

            def _get_current_count(self) -> int:
                """获取当前计数"""
                try:
                    with open(self.counter_file, 'r') as f:
                        return int(f.read().strip())
                except (FileNotFoundError, ValueError):
                    return 0

            def _update_counter(self, new_count: int):
                """更新计数"""
                with open(self.counter_file, 'w') as f:
                    f.write(str(new_count))

            def _get_allocations(self) -> Dict[str, Any]:
                """获取分配信息"""
                try:
                    with open(self.allocations_file, 'r') as f:
                        return json.load(f)
                except (FileNotFoundError, json.JSONDecodeError):
                    return {}

            def _update_allocations(self, allocations: Dict[str, Any]):
                """更新分配信息"""
                with open(self.allocations_file, 'w') as f:
                    json.dump(allocations, f, indent=2)

            def _cleanup_expired_allocations(self, max_age_seconds: float = 300):
                """清理过期的分配"""
                allocations = self._get_allocations()
                current_time = time.time()
                cleanup_count = 0

                expired_keys = []
                for key, allocation in allocations.items():
                    acquire_time = datetime.fromisoformat(allocation['acquire_time'])
                    age_seconds = current_time - acquire_time.timestamp()

                    if age_seconds > max_age_seconds:
                        expired_keys.append(key)

                for key in expired_keys:
                    del allocations[key]
                    cleanup_count += 1
                    logger.warning(f"清理过期资源分配: {key}")

                if cleanup_count > 0:
                    self._update_allocations(allocations)

                    # 更新计数
                    new_count = self._get_current_count() - cleanup_count
                    if new_count >= 0:
                        self._update_counter(new_count)

        def resource_user(user_id: str, resource_manager: SharedResourceManager, num_requests: int = 5):
            """资源使用进程"""
            logger.info(f"资源用户 {user_id} 启动")

            acquired_resources = []

            for i in range(num_requests):
                # 尝试获取资源
                resource_id = resource_manager.acquire_resource(user_id, timeout=5)

                if resource_id is not None:
                    acquired_resources.append(resource_id)
                    logger.info(f"{user_id} 获取资源 {resource_id}")

                    # 模拟使用资源
                    use_time = random.uniform(1, 3)
                    time.sleep(use_time)

                    # 释放资源
                    if resource_manager.release_resource(resource_id, user_id):
                        acquired_resources.remove(resource_id)
                        logger.info(f"{user_id} 释放资源 {resource_id}")
                    else:
                        logger.error(f"{user_id} 释放资源 {resource_id} 失败")
                else:
                    logger.warning(f"{user_id} 获取资源失败")

                time.sleep(random.uniform(0.5, 1.5))

            # 确保所有资源都被释放
            for resource_id in acquired_resources.copy():
                resource_manager.release_resource(resource_id, user_id)
                acquired_resources.remove(resource_id)

            logger.info(f"资源用户 {user_id} 结束")

        def run_shared_resource_management():
            """运行共享资源管理示例"""
            resource_manager = SharedResourceManager("database_connection", max_concurrent=3)

            # 启动多个资源用户
            users = []
            for i in range(5):
                p = multiprocessing.Process(
                    target=resource_user,
                    args=(f"user_{i}", resource_manager, 8)
                )
                users.append(p)
                p.start()

            # 定期输出资源状态
            for i in range(10):
                time.sleep(2)
                status = resource_manager.get_resource_status()
                logger.info(f"资源状态检查 {i+1}: {status}")

            # 等待所有用户完成
            for p in users:
                p.join()

            # 最终状态
            final_status = resource_manager.get_resource_status()
            logger.info(f"最终资源状态: {final_status}")
        ---

04.日志文件并发管理
    a.场景描述
        多个进程同时向同一日志文件写入,需要确保日志条目的完整性和顺序性。
    b.技术挑战
        a.并发写入冲突
            多个进程同时写入可能导致日志混乱。
        b.文件同步问题
            确保日志条目按时间顺序正确记录。
        c.性能优化
            在保证日志完整性的前提下提高写入性能。
    c.解决方案
        a.文件锁定机制
            使用文件锁控制对日志文件的写入访问。
        b.缓冲写入
            实现缓冲机制减少频繁的磁盘I/O。
        c.日志轮转
            自动管理日志文件大小和归档。
    d.代码示例
        ---
        class ConcurrentLogger:
            def __init__(self, log_file: str, buffer_size: int = 100, max_file_size: int = 10*1024*1024):
                self.log_file = log_file
                self.buffer_size = buffer_size
                self.max_file_size = max_file_size
                self.lock_file = f"{log_file}.lock"
                self.buffer = []
                self.sequence_file = f"{log_file}.sequence"

                # 初始化序列号文件
                self._init_sequence_file()

            def _init_sequence_file(self):
                """初始化序列号文件"""
                if not os.path.exists(self.sequence_file):
                    with open(self.sequence_file, 'w') as f:
                        f.write('0')

            def _get_next_sequence(self) -> int:
                """获取下一个序列号"""
                if self._acquire_log_lock():
                    try:
                        with open(self.sequence_file, 'r+') as f:
                            sequence = int(f.read().strip())
                            sequence += 1
                            f.seek(0)
                            f.write(str(sequence))
                            return sequence
                    finally:
                        self._release_log_lock()
                else:
                    raise RuntimeError("无法获取序列号锁")

            def _acquire_log_lock(self, timeout: float = 10.0) -> bool:
                """获取日志锁"""
                start_time = time.time()

                while time.time() - start_time < timeout:
                    try:
                        fd = os.open(self.lock_file, os.O_CREAT | os.O_EXCL | os.O_WRONLY)

                        # 写入锁信息
                        lock_info = {
                            'pid': os.getpid(),
                            'acquire_time': datetime.now().isoformat()
                        }

                        with os.fdopen(fd, 'w') as f:
                            json.dump(lock_info, f)

                        return True
                    except OSError:
                        time.sleep(0.1)

                return False

            def _release_log_lock(self):
                """释放日志锁"""
                try:
                    os.remove(self.lock_file)
                except OSError:
                    pass

            def log(self, level: str, message: str, process_id: str = None) -> bool:
                """记录日志"""
                try:
                    # 获取序列号
                    sequence = self._get_next_sequence()

                    # 格式化日志条目
                    timestamp = datetime.now().isoformat()
                    process_info = process_id or f"PID-{os.getpid()}"
                    log_entry = {
                        'sequence': sequence,
                        'timestamp': timestamp,
                        'level': level,
                        'process': process_info,
                        'message': message
                    }

                    # 添加到缓冲区
                    self.buffer.append(log_entry)

                    # 检查是否需要刷新缓冲区
                    if len(self.buffer) >= self.buffer_size:
                        return self._flush_buffer()

                    return True
                except Exception as e:
                    logger.error(f"记录日志失败: {e}")
                    return False

            def _flush_buffer(self) -> bool:
                """刷新缓冲区到文件"""
                if not self.buffer:
                    return True

                if not self._acquire_log_lock():
                    logger.warning("无法获取日志锁,缓冲区刷新失败")
                    return False

                try:
                    # 检查文件大小,必要时轮转
                    if self._should_rotate_log():
                        self._rotate_log()

                    # 写入缓冲区内容
                    with open(self.log_file, 'a', encoding='utf-8') as f:
                        for entry in self.buffer:
                            log_line = json.dumps(entry, ensure_ascii=False)
                            f.write(log_line + '\n')

                    flushed_count = len(self.buffer)
                    self.buffer.clear()

                    logger.info(f"成功刷新 {flushed_count} 条日志到文件")
                    return True

                except Exception as e:
                    logger.error(f"刷新缓冲区失败: {e}")
                    return False
                finally:
                    self._release_log_lock()

            def _should_rotate_log(self) -> bool:
                """检查是否需要轮转日志"""
                try:
                    return os.path.getsize(self.log_file) >= self.max_file_size
                except OSError:
                    return False

            def _rotate_log(self):
                """轮转日志文件"""
                timestamp = int(time.time())
                rotated_file = f"{self.log_file}.{timestamp}"

                try:
                    if os.path.exists(self.log_file):
                        os.rename(self.log_file, rotated_file)
                        logger.info(f"日志文件已轮转到: {rotated_file}")
                except Exception as e:
                    logger.error(f"日志轮转失败: {e}")

            def force_flush(self) -> bool:
                """强制刷新缓冲区"""
                return self._flush_buffer()

            def get_log_info(self) -> Dict[str, Any]:
                """获取日志信息"""
                info = {
                    'log_file': self.log_file,
                    'buffer_size': len(self.buffer),
                    'max_buffer_size': self.buffer_size,
                    'file_exists': os.path.exists(self.log_file)
                }

                if info['file_exists']:
                    try:
                        info['file_size'] = os.path.getsize(self.log_file)
                        info['last_modified'] = datetime.fromtimestamp(
                            os.path.getmtime(self.log_file)
                        ).isoformat()
                    except OSError:
                        pass

                return info

        def logging_worker(worker_id: str, logger_instance: ConcurrentLogger, num_logs: int = 50):
            """日志工作进程"""
            import random

            levels = ['INFO', 'WARNING', 'ERROR', 'DEBUG']

            for i in range(num_logs):
                level = random.choice(levels)
                message = f"Worker {worker_id} log message {i+1}"

                if logger_instance.log(level, message, worker_id):
                    if (i + 1) % 10 == 0:
                        logger.info(f"Worker {worker_id} 已记录 {i+1} 条日志")
                else:
                    logger.error(f"Worker {worker_id} 记录日志失败: {message}")

                # 随机间隔
                time.sleep(random.uniform(0.1, 0.5))

            # 强制刷新剩余日志
            logger_instance.force_flush()
            logger.info(f"Worker {worker_id} 日志记录完成")

        def run_concurrent_logging():
            """运行并发日志示例"""
            log_file = "concurrent_app.log"
            concurrent_logger = ConcurrentLogger(log_file, buffer_size=20, max_file_size=1024*1024)  # 1MB

            # 启动多个日志工作进程
            workers = []
            for i in range(4):
                p = multiprocessing.Process(
                    target=logging_worker,
                    args=(f"worker_{i}", concurrent_logger, 30)
                )
                workers.append(p)
                p.start()

            # 定期输出日志信息
            for i in range(8):
                time.sleep(2)
                log_info = concurrent_logger.get_log_info()
                logger.info(f"日志状态检查 {i+1}: {log_info}")

            # 等待所有工作进程完成
            for p in workers:
                p.join()

            # 最终刷新并输出状态
            concurrent_logger.force_flush()
            final_info = concurrent_logger.get_log_info()
            logger.info(f"最终日志状态: {final_info}")

            # 统计日志条目数
            if os.path.exists(log_file):
                with open(log_file, 'r', encoding='utf-8') as f:
                    log_count = sum(1 for _ in f)
                logger.info(f"总共记录了 {log_count} 条日志")
        ---

05.最佳实践与注意事项
    a.设计原则
        a.最小锁粒度
            锁定的范围尽可能小,减少对并发性能的影响。
        b.快速失败策略
            在无法获取锁时快速失败,避免长时间等待。
        c.资源清理
            确保在异常情况下正确清理锁和资源。
    b.性能优化
        a.批量操作
            将多个小操作合并为一次锁定操作。
        b.异步处理
            使用异步模式减少锁等待时间。
        c.缓存机制
            合理使用缓存减少对锁文件的访问。
    c.可靠性保障
        a.死锁检测
            实现死锁检测和自动恢复机制。
        b.超时机制
            为所有锁操作设置合理的超时时间。
        c.故障恢复
            处理进程异常退出时的锁清理问题。
    d.监控与调试
        a.锁状态监控
            实时监控锁的获取和释放状态。
        b.性能分析
            分析锁竞争对系统性能的影响。
        c.详细日志
            记录详细的锁操作日志便于问题诊断。

9. 分布式锁

9.1 Redis分布式锁

01.分布式锁概述
    a.定义与作用
        分布式锁是在分布式系统中协调多个节点访问共享资源的关键机制,确保在任何时刻只有一个节点能够访问特定的共享资源。
    b.核心特性
        a.互斥性
            在同一时间只有一个客户端能够获得锁。
        b.可重入性
            同一个客户端可以多次获得同一个锁。
        c.防死锁
            具备超时机制,防止客户端崩溃导致的死锁。
        d.高可用
            锁服务本身具备高可用性,避免单点故障。
    c.应用场景
        a.分布式任务调度
            确保定时任务在集群中只在一个节点上执行。
        b.缓存更新
            在更新缓存时防止多个节点同时操作。
        c.资源访问控制
            略
        d.主从切换协调
            在数据库主从切换过程中协调节点的角色。

02.Redis锁实现原理
    a.基本机制
        a.SET NX EX命令
            使用Redis的SET命令配合NX和EX参数实现原子性的锁获取。
        b.Lua脚本
            使用Lua脚本保证锁获取和释放的原子性。
        c.过期时间
            通过Redis的TTL机制实现锁的自动过期。
    b.锁实现方案
        a.简单实现
            使用单个Redis键作为锁标识,配合过期时间。
        b.可重入锁
            使用计数器机制实现同客户端多次获取锁。
        c.红锁算法
            在多个Redis实例上创建锁,提高可靠性。
    c.代码示例
        ---
        # Redis分布式锁基本实现示例
        import redis
        import time
        import uuid
        import logging
        from datetime import datetime
        from typing import Optional, Dict, Any

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
        logger = logging.getLogger(__name__)

        class RedisLock:
            def __init__(self, redis_client, lock_name: str, timeout: int = 30):
                self.redis_client = redis_client
                self.lock_name = lock_name
                self.timeout = timeout
                self.identifier = str(uuid.uuid4())
                self.acquired = False

            def acquire(self, blocking: bool = True, blocking_timeout: int = 30) -> bool:
                """获取分布式锁"""
                logger.info(f"尝试获取锁: {self.lock_name}, 标识符: {self.identifier}")

                if blocking:
                    start_time = time.time()

                    while time.time() - start_time < blocking_timeout:
                        if self._try_acquire():
                            self.acquired = True
                            logger.info(f"成功获取锁: {self.lock_name}")
                            return True
                        time.sleep(0.1)

                    logger.warning(f"获取锁超时: {self.lock_name}")
                    return False
                else:
                    return self._try_acquire()

            def _try_acquire(self) -> bool:
                """尝试获取锁"""
                try:
                    # 使用SET NX EX命令原子性获取锁
                    result = self.redis_client.set(
                        self.lock_name,
                        self.identifier,
                        nx=True,
                        ex=self.timeout
                    )
                    return bool(result)
                except Exception as e:
                    logger.error(f"获取Redis锁失败: {e}")
                    return False

            def release(self) -> bool:
                """释放分布式锁"""
                if not self.acquired:
                    logger.warning(f"锁未获取: {self.lock_name}")
                    return False

                logger.info(f"尝试释放锁: {self.lock_name}, 标识符: {self.identifier}")

                # 使用Lua脚本确保原子性释放
                lua_script = """
                    if redis.call("GET", KEYS[1]) == ARGV[1] then
                        return redis.call("DEL", KEYS[1])
                    else
                        return 0
                    end
                """

                try:
                    result = self.redis_client.eval(
                        lua_script,
                        1,  # keys数量
                        self.lock_name,  # key
                        self.identifier  # value
                    )

                    if result:
                        self.acquired = False
                        logger.info(f"成功释放锁: {self.lock_name}")
                        return True
                    else:
                        logger.warning(f"锁已过期或不属于当前进程: {self.lock_name}")
                        return False

                except Exception as e:
                    logger.error(f"释放Redis锁失败: {e}")
                    return False

            def __enter__(self):
                if self.acquire():
                    return self
                else:
                    raise RuntimeError(f"无法获取锁: {self.lock_name}")

            def __exit__(self, exc_type, exc_val, exc_tb):
                self.release()

        class RedisLockManager:
            def __init__(self, redis_host='localhost', redis_port=6379, redis_db=0):
                self.redis_client = redis.Redis(
                    host=redis_host,
                    port=redis_port,
                    db=redis_db,
                    decode_responses=True
                )
                self.locks = {}

            def get_lock(self, lock_name: str, timeout: int = 30) -> RedisLock:
                """获取分布式锁实例"""
                if lock_name not in self.locks:
                    self.locks[lock_name] = RedisLock(self.redis_client, lock_name, timeout)
                return self.locks[lock_name]

            def cleanup_all_locks(self):
                """清理所有锁"""
                for lock_name, lock in self.locks.items():
                    if lock.acquired:
                        lock.release()
                self.locks.clear()

        def demo_basic_usage():
            """基本使用演示"""
            # 连接Redis
            lock_manager = RedisLockManager()

            try:
                # 获取锁
                with lock_manager.get_lock("resource_1", timeout=10) as lock:
                    logger.info("获得锁,开始执行临界区代码")

                    # 模拟工作
                    for i in range(5):
                        logger.info(f"执行工作 {i+1}/5")
                        time.sleep(1)

                    logger.info("临界区代码执行完成")

            except Exception as e:
                logger.error(f"执行失败: {e}")
            finally:
                lock_manager.cleanup_all_locks()
        ---

03.高级Redis锁实现
    a.可重入分布式锁
        a.计数器机制
            使用Redis哈希表存储锁的计数信息,支持同客户端多次获取。
        b.续期机制
            自动延长锁的有效期,防止长时间任务中锁过期。
        c.代码示例
            ---
            class RedisReentrantLock:
                def __init__(self, redis_client, lock_name: str, timeout: int = 30,
                           auto_renewal: bool = True, renewal_interval: int = 10):
                    self.redis_client = redis_client
                    self.lock_name = lock_name
                    self.timeout = timeout
                    self.identifier = str(uuid.uuid4())
                    self.acquired = False
                    self.auto_renewal = auto_renewal
                    self.renewal_interval = renewal_interval
                    self.renewal_thread = None
                    self.stop_renewal = threading.Event()

                def acquire(self, blocking: bool = True, blocking_timeout: int = 30) -> bool:
                    """获取可重入锁"""
                    if not blocking:
                        return self._try_acquire()

                    start_time = time.time()
                    while time.time() - start_time < blocking_timeout:
                        if self._try_acquire():
                            self.acquired = True

                            # 启动自动续期
                            if self.auto_renewal:
                                self._start_renewal_thread()

                            return True

                        time.sleep(0.1)

                    return False

                def _try_acquire(self) -> bool:
                    """尝试获取锁"""
                    lua_script = """
                    local lock_key = KEYS[1]
                    local identifier = ARGV[1]
                    local timeout = tonumber(ARGV[2])
                    local current_time = tonumber(ARGV[3])

                    local count_key = lock_key .. ":count"
                    local expire_key = lock_key .. ":expire"

                    -- 检查锁是否存在
                    local existing_identifier = redis.call("GET", lock_key)

                    if existing_identifier == identifier then
                        -- 同一客户端,增加重入计数
                        redis.call("HINCRBY", count_key, identifier, 1)
                        redis.call("EXPIRE", lock_key, timeout)
                        redis.call("EXPIRE", count_key, timeout)
                        return 1
                    elseif not existing_identifier then
                        -- 锁不存在,创建新锁
                        redis.call("SET", lock_key, identifier, "EX", timeout)
                        redis.call("HSET", count_key, identifier, 1)
                        redis.call("EXPIRE", count_key, timeout)
                        redis.call("SET", expire_key, current_time + timeout, "EX", timeout)
                        return 1
                    else
                        -- 锁被其他客户端持有
                        return 0
                    end
                    """

                    try:
                        current_time = int(time.time())
                        result = self.redis_client.eval(
                            lua_script,
                            1,  # keys数量
                            self.lock_name,  # key
                            self.identifier,  # identifier
                            self.timeout,  # timeout
                            current_time   # current_time
                        )

                        return bool(result)
                    except Exception as e:
                        logger.error(f"获取可重入锁失败: {e}")
                        return False

                def _start_renewal_thread(self):
                    """启动自动续期线程"""
                    self.stop_renewal.clear()
                    self.renewal_thread = threading.Thread(target=self._renewal_worker)
                    self.renewal_thread.daemon = True
                    self.renewal_thread.start()

                def _renewal_worker(self):
                    """续期工作线程"""
                    while not self.stop_renewal.wait(self.renewal_interval):
                        try:
                            lua_script = """
                            local lock_key = KEYS[1]
                            local identifier = ARGV[1]
                            local timeout = tonumber(ARGV[2])
                            local current_time = tonumber(ARGV[3])

                            local expire_key = lock_key .. ":expire"
                            local expire_time = tonumber(redis.call("GET", expire_key))

                            -- 检查锁是否属于当前客户端且即将过期
                            if expire_time and expire_time - current_time < timeout / 2 then
                                redis.call("EXPIRE", lock_key, timeout)
                                redis.call("EXPIRE", lock_key .. ":count", timeout)
                                redis.call("SET", expire_key, current_time + timeout, "EX", timeout)
                                return 1
                            end
                            return 0
                            """

                            current_time = int(time.time())
                            self.redis_client.eval(
                                lua_script,
                                1,
                                self.lock_name,
                                self.identifier,
                                self.timeout,
                                current_time
                            )

                            logger.debug(f"检查并续期锁: {self.lock_name}")

                        except Exception as e:
                            logger.error(f"续期锁失败: {e}")

                def release(self) -> bool:
                    """释放可重入锁"""
                    if not self.acquired:
                        return False

                    # 停止续期线程
                    if self.auto_renewal and self.renewal_thread:
                        self.stop_renewal.set()
                        self.renewal_thread.join(timeout=2)

                    lua_script = """
                    local lock_key = KEYS[1]
                    local identifier = ARGV[1]
                    local count_key = lock_key .. ":count"

                    local current_count = redis.call("HGET", count_key, identifier)

                    if current_count then
                        current_count = tonumber(current_count)

                        if current_count > 1 then
                            -- 减少重入计数
                            redis.call("HINCRBY", count_key, identifier, -1)
                            return 1
                        else
                            -- 完全释放锁
                            redis.call("DEL", lock_key)
                            redis.call("DEL", count_key)
                            redis.call("DEL", lock_key .. ":expire")
                            return 1
                        end
                    end

                    return 0
                    """

                    try:
                        result = self.redis_client.eval(lua_script, 1, self.lock_name, self.identifier)

                        if result:
                            self.acquired = False
                            logger.info(f"释放可重入锁成功: {self.lock_name}")
                            return True
                        else:
                            logger.warning(f"锁已不存在或计数错误: {self.lock_name}")
                            self.acquired = False
                            return False

                    except Exception as e:
                        logger.error(f"释放可重入锁失败: {e}")
                        return False

                def __enter__(self):
                    if self.acquire():
                        return self
                    else:
                        raise RuntimeError(f"无法获取可重入锁: {self.lock_name}")

                def __exit__(self, exc_type, exc_val, exc_tb):
                    self.release()
            ---
    b.红锁算法实现
        a.算法原理
            在多个Redis主节点上同时创建锁,确保即使部分节点故障也能保证锁的安全性。
        b.实现步骤
            a.获取当前时间戳
            b.在所有Redis实例上尝试创建锁
            c.检查锁创建成功的时间和数量
            d.验证锁的有效性
        c.代码示例
            ---
            class RedLock:
                def __init__(self, redis_clients: list, lock_name: str, timeout: int = 30,
                           retry_delay: float = 0.1):
                    self.redis_clients = redis_clients
                    self.lock_name = lock_name
                    self.timeout = timeout
                    self.retry_delay = retry_delay
                    self.identifier = str(uuid.uuid4())
                    self.acquired = False

                def acquire(self, retry_times: int = 3) -> bool:
                    """获取红锁"""
                    logger.info(f"尝试获取红锁: {self.lock_name}")

                    for attempt in range(retry_times):
                        start_time = time.time()

                        # 在所有Redis实例上尝试创建锁
                        successful_locks = 0
                        failed_locks = []

                        for i, redis_client in enumerate(self.redis_clients):
                            if self._lock_instance(redis_client, i + 1):
                                successful_locks += 1
                            else:
                                failed_locks.append(i + 1)

                        # 计算总耗时
                        elapsed = time.time() - start_time

                        # 检查是否成功获取足够数量的锁
                        if successful_locks > len(self.redis_clients) // 2:
                            self.acquired = True
                            logger.info(f"红锁获取成功: {self.lock_name}, 成功节点: {successful_locks}/{len(self.redis_clients)}")
                            return True

                        # 获取失败,释放已获取的锁
                        self._release_all_locks()

                        # 等待重试
                        remaining_time = self.timeout - elapsed
                        if remaining_time > 0:
                            time.sleep(min(self.retry_delay, remaining_time))
                        else:
                            break

                    logger.error(f"红锁获取失败: {self.lock_name}")
                    return False

                def _lock_instance(self, redis_client, instance_id: int) -> bool:
                    """在单个Redis实例上创建锁"""
                    lua_script = """
                    local lock_key = KEYS[1]
                    local identifier = ARGV[1]
                    local timeout = tonumber(ARGV[2])
                    local current_time = tonumber(ARGV[3])

                    if redis.call("SET", lock_key, identifier, "NX", "PX", timeout * 1000) then
                        redis.call("SET", lock_key .. ":owner", identifier, "PX", timeout * 1000)
                        return 1
                    else
                        return 0
                    end
                    """

                    try:
                        current_time = int(time.time() * 1000)  # 毫秒
                        result = redis_client.eval(
                            lua_script,
                            1,
                            self.lock_name,
                            self.identifier,
                            self.timeout,
                            current_time
                        )
                        return bool(result)
                    except Exception as e:
                        logger.error(f"Redis实例 {instance_id} 锁定失败: {e}")
                        return False

                def _release_all_locks(self):
                    """释放所有Redis实例上的锁"""
                    for redis_client in self.redis_clients:
                        try:
                            lua_script = """
                            local lock_key = KEYS[1]
                            local identifier = ARGV[1]

                            if redis.call("GET", lock_key) == identifier then
                                return redis.call("DEL", lock_key)
                            else
                                return 0
                            end
                            """

                            redis_client.eval(lua_script, 1, self.lock_name, self.identifier)
                        except Exception as e:
                            logger.error(f"释放Redis实例锁失败: {e}")

                def release(self) -> bool:
                    """释放红锁"""
                    if not self.acquired:
                        return False

                    self._release_all_locks()
                    self.acquired = False
                    logger.info(f"红锁释放成功: {self.lock_name}")
                    return True

                def __enter__(self):
                    if self.acquire():
                        return self
                    else:
                        raise RuntimeError(f"无法获取红锁: {self.lock_name}")

                def __exit__(self, exc_type, exc_val, exc_tb):
                    self.release()
            ---

04.实际应用场景
    a.分布式任务调度
        a.场景描述
            在分布式系统中确保定时任务只在一个节点上执行。
        b.实现方案
            使用Redis锁协调任务的执行权限。
        c.代码示例
            ---
            class DistributedTaskScheduler:
                def __init__(self, lock_manager: RedisLockManager):
                    self.lock_manager = lock_manager
                    self.task_registry = {}
                    self.execution_log = []

                def schedule_task(self, task_name: str, task_func, interval: int = 60,
                               max_executions: int = None):
                    """调度分布式任务"""
                    lock_name = f"task_lock:{task_name}"

                    def task_worker():
                        while True:
                            # 获取任务执行锁
                            with self.lock_manager.get_lock(lock_name, timeout=5):
                                logger.info(f"开始执行任务: {task_name}")

                                try:
                                    # 检查执行次数限制
                                    if max_executions and self._get_execution_count(task_name) >= max_executions:
                                        logger.info(f"任务 {task_name} 已达到最大执行次数限制")
                                        break

                                    # 执行任务
                                    start_time = time.time()
                                    result = task_func()
                                    execution_time = time.time() - start_time

                                    # 记录执行结果
                                    self._record_execution(task_name, result, execution_time)

                                    logger.info(f"任务 {task_name} 执行完成,耗时: {execution_time:.2f}s")

                                except Exception as e:
                                    logger.error(f"任务 {task_name} 执行失败: {e}")
                                    self._record_execution(task_name, f"error: {str(e)}", 0)

                            # 等待下次执行
                            time.sleep(interval)

                    # 启动任务工作线程
                    import threading
                    task_thread = threading.Thread(target=task_worker, daemon=True)
                    task_thread.start()

                    self.task_registry[task_name] = {
                        'thread': task_thread,
                        'interval': interval,
                        'max_executions': max_executions,
                        'created_time': datetime.now().isoformat()
                    }

                    logger.info(f"任务 {task_name} 已调度,执行间隔: {interval}s")

                def _get_execution_count(self, task_name: str) -> int:
                    """获取任务执行次数"""
                    try:
                        count_key = f"task_count:{task_name}"
                        count = self.lock_manager.redis_client.get(count_key)
                        return int(count) if count else 0
                    except:
                        return 0

                def _record_execution(self, task_name: str, result, execution_time: float):
                    """记录任务执行"""
                    try:
                        # 增加执行计数
                        count_key = f"task_count:{task_name}"
                        self.lock_manager.redis_client.incr(count_key)

                        # 记录执行日志
                        log_entry = {
                            'task_name': task_name,
                            'result': result,
                            'execution_time': execution_time,
                            'timestamp': datetime.now().isoformat()
                        }

                        log_key = f"task_log:{task_name}"
                        self.lock_manager.redis_client.lpush(log_key, json.dumps(log_entry))
                        self.lock_manager.redis_client.ltrim(log_key, 0, 100)  # 保留最近100条

                    except Exception as e:
                        logger.error(f"记录任务执行失败: {e}")

                def get_task_status(self, task_name: str) -> Dict[str, Any]:
                    """获取任务状态"""
                    try:
                        count = self._get_execution_count(task_name)
                        recent_logs = []

                        # 获取最近的执行日志
                        log_key = f"task_log:{task_name}"
                        logs = self.lock_manager.redis_client.lrange(log_key, 0, 10)
                        for log in logs:
                            recent_logs.append(json.loads(log))

                        return {
                            'task_name': task_name,
                            'execution_count': count,
                            'recent_executions': recent_logs,
                            'is_registered': task_name in self.task_registry
                        }
                    except Exception as e:
                        return {'error': str(e)}

            # 使用示例
            def sample_task():
                """示例任务函数"""
                import random
                processing_time = random.uniform(1, 3)
                time.sleep(processing_time)
                return f"任务处理完成,耗时: {processing_time:.2f}s"

            def demo_task_scheduling():
                """任务调度演示"""
                # 创建Redis锁管理器
                lock_manager = RedisLockManager()
                scheduler = DistributedTaskScheduler(lock_manager)

                # 调度几个示例任务
                tasks = [
                    ('cleanup_task', lambda: time.sleep(2) or "清理完成", 30, 5),
                    ('data_sync_task', lambda: time.sleep(1.5) or "同步完成", 45, None),
                    ('monitoring_task', lambda: time.sleep(0.5) or "监控完成", 15, None)
                ]

                for task_name, task_func, interval, max_exec in tasks:
                    scheduler.schedule_task(task_name, task_func, interval, max_exec)
                    logger.info(f"任务 {task_name} 已调度")

                # 监控任务状态
                for _ in range(10):
                    time.sleep(5)

                    for task_name, _, _, _ in tasks:
                        status = scheduler.get_task_status(task_name)
                        if 'error' not in status:
                            logger.info(f"任务 {task_name} 状态: 执行次数={status['execution_count']}")
            ---
    b.缓存一致性控制
        a.场景描述
            在分布式系统中确保缓存更新的一致性,防止缓存击穿和缓存雪崩。
        b.实现方案
            使用Redis锁控制缓存的读取和更新操作。
        c.代码示例
            ---
            class CacheConsistencyController:
                def __init__(self, lock_manager: RedisLockManager):
                    self.lock_manager = lock_manager
                    self.cache_stats = {
                        'hits': 0,
                        'misses': 0,
                        'updates': 0,
                        'lock_contentions': 0
                    }

                def get_with_cache_fallback(self, key: str, data_fetcher, cache_ttl: int = 300):
                    """带缓存回退的数据获取"""
                    cache_key = f"cache:{key}"
                    lock_key = f"lock:{key}"

                    # 尝试从缓存获取数据
                    try:
                        cached_data = self.lock_manager.redis_client.get(cache_key)
                        if cached_data:
                            self.cache_stats['hits'] += 1
                            logger.debug(f"缓存命中: {key}")
                            return json.loads(cached_data)
                    except Exception as e:
                        logger.error(f"读取缓存失败: {e}")

                    self.cache_stats['misses'] += 1
                    logger.info(f"缓存未命中: {key}")

                    # 获取锁并更新缓存
                    lock_acquired = False
                    try:
                        # 非阻塞获取锁
                        lock = self.lock_manager.get_lock(lock_key, timeout=5)
                        if lock.acquire(blocking=False):
                            lock_acquired = True

                            # 再次检查缓存(双重检查)
                            try:
                                cached_data = self.lock_manager.redis_client.get(cache_key)
                                if cached_data:
                                    self.cache_stats['hits'] += 1
                                    return json.loads(cached_data)
                            except:
                                pass

                            # 从数据源获取数据
                            logger.info(f"从数据源获取数据: {key}")
                            data = data_fetcher()

                            # 更新缓存
                            try:
                                self.lock_manager.redis_client.setex(
                                    cache_key,
                                    cache_ttl,
                                    json.dumps(data)
                                )
                                self.cache_stats['updates'] += 1
                                logger.info(f"缓存更新完成: {key}")
                            except Exception as e:
                                logger.error(f"更新缓存失败: {e}")

                            return data
                        else:
                            # 锁被占用,直接返回数据源数据
                            self.cache_stats['lock_contentions'] += 1
                            logger.warning(f"缓存锁竞争,直接获取数据: {key}")
                            return data_fetcher()

                    except Exception as e:
                        logger.error(f"缓存控制异常: {e}")
                        return data_fetcher()
                    finally:
                        if lock_acquired:
                            try:
                                lock.release()
                            except:
                                pass

                def invalidate_cache(self, key: str):
                    """缓存失效"""
                    cache_key = f"cache:{key}"
                    lock_key = f"lock:{key}"

                    with self.lock_manager.get_lock(lock_key, timeout=5):
                        try:
                            deleted = self.lock_manager.redis_client.delete(cache_key)
                            if deleted:
                                logger.info(f"缓存已失效: {key}")
                            else:
                                logger.warning(f"缓存不存在: {key}")
                        except Exception as e:
                            logger.error(f"缓存失效失败: {e}")

                def get_cache_statistics(self) -> Dict[str, int]:
                    """获取缓存统计信息"""
                    return self.cache_stats.copy()

            # 使用示例
            def demo_cache_consistency():
                """缓存一致性演示"""
                lock_manager = RedisLockManager()
                cache_controller = CacheConsistencyController(lock_manager)

                def database_fetcher(key):
                    """模拟数据库数据获取"""
                    # 模拟数据库查询延迟
                    time.sleep(0.2)
                    return f"db_data_for_{key}_timestamp_{int(time.time())}"

                # 测试缓存一致性
                test_keys = ['user_123', 'product_456', 'order_789']

                for key in test_keys:
                    # 第一次获取(缓存未命中)
                    data1 = cache_controller.get_with_cache_fallback(
                        key,
                        lambda key: database_fetcher(key),
                        cache_ttl=60
                    )
                    logger.info(f"第一次获取 {key}: {data1}")

                    # 第二次获取(缓存命中)
                    data2 = cache_controller.get_with_cache_fallback(
                        key,
                        lambda key: database_fetcher(key),
                        cache_ttl=60
                    )
                    logger.info(f"第二次获取 {key}: {data2}")

                # 输出缓存统计
                stats = cache_controller.get_cache_statistics()
                logger.info(f"缓存统计: {stats}")
            ---

05.最佳实践与注意事项
    a.设计原则
        a.锁粒度控制
            选择合适的锁粒度,避免过度锁定。
        b.超时设置
            根据业务特点设置合理的锁超时时间。
        c.错误处理
            完善的错误处理和重试机制。
    b.性能优化
        a.锁竞争减少
            通过算法设计减少锁竞争。
        b.批量操作
            将多个操作合并为一次锁保护。
        c.异步处理
            在可能的情况下使用异步模式。
    c.监控与调试
        a.锁监控
            监控锁的获取、释放和竞争情况。
        b.性能分析
            分析锁对系统性能的影响。
        c.故障排查
            提供详细的日志用于问题诊断。
    d.高可用考虑
        a.Redis集群
            使用Redis Cluster提高可用性。
        b.主从复制
            配置主从复制确保数据可靠性。
        c.故障转移
            实现自动故障转移机制。

9.2 数据库锁

01.基本概念
    a.定义与作用
        数据库锁是基于数据库系统实现的分布式锁机制,通过数据库的原子操作和事务特性来实现跨进程的互斥访问控制。
    b.核心特性
        a.持久化存储
            锁信息存储在数据库中,具有持久性和故障恢复能力。
        b.事务支持
            利用数据库的事务机制确保锁操作的原子性和一致性。
        c.多数据库支持
            支持主流关系型数据库如MySQL、PostgreSQL、Oracle等。
        d.ACID保证
            提供完整的ACID特性,确保锁操作的可靠性。
    c.与其它分布式锁的比较
        a.性能特点
            相比Redis性能较低,但可靠性更高。
        b.部署复杂度
            依赖数据库系统,需要额外的数据库维护成本。
        c.适用场景
            适用于对可靠性要求极高的分布式系统。

02.基本使用
    a.数据库表结构设计
        a.锁表设计
            包含锁名称、持有者、获取时间、过期时间等字段。
        b.索引优化
            在锁名称上创建唯一索引,确保锁的唯一性。
        c.清理机制
            定期清理过期锁记录,防止表膨胀。
    b.锁获取与释放
        a.获取锁
            使用INSERT或UPDATE操作原子性地获取锁。
        b.释放锁
            通过DELETE或UPDATE操作释放锁。
        c.锁续期
            更新锁的过期时间来延长锁的有效期。
    c.代码示例
        ---
        # 数据库分布式锁基本实现
        import time
        import threading
        import logging
        from datetime import datetime, timedelta
        from typing import Optional, Dict, Any
        import uuid
        from contextlib import contextmanager

        # 模拟数据库连接(实际项目中使用真实的数据库连接)
        class MockDatabaseConnection:
            def __init__(self):
                self.locks_table = {}  # 模拟锁表
                self._lock = threading.Lock()  # 保护内部数据结构

            def execute_insert(self, table: str, data: Dict[str, Any]) -> bool:
                """执行插入操作"""
                with self._lock:
                    if table in self.locks_table:
                        # 检查锁名是否已存在(模拟唯一索引约束)
                        existing_locks = [lock for lock in self.locks_table[table]
                                        if lock['lock_name'] == data['lock_name']]
                        if existing_locks:
                            return False  # 锁已存在,插入失败

                    if table not in self.locks_table:
                        self.locks_table[table] = []

                    data['created_at'] = datetime.now()
                    self.locks_table[table].append(data.copy())
                    return True

            def execute_update(self, table: str, condition: Dict[str, Any],
                            update_data: Dict[str, Any]) -> int:
                """执行更新操作"""
                with self._lock:
                    if table not in self.locks_table:
                        return 0

                    updated_count = 0
                    for record in self.locks_table[table]:
                        match = all(record.get(k) == v for k, v in condition.items())
                        if match:
                            record.update(update_data)
                            updated_count += 1

                    return updated_count

            def execute_delete(self, table: str, condition: Dict[str, Any]) -> int:
                """执行删除操作"""
                with self._lock:
                    if table not in self.locks_table:
                        return 0

                    original_count = len(self.locks_table[table])
                    self.locks_table[table] = [
                        record for record in self.locks_table[table]
                        if not all(record.get(k) == v for k, v in condition.items())
                    ]

                    return original_count - len(self.locks_table[table])

            def query_one(self, table: str, condition: Dict[str, Any]) -> Optional[Dict[str, Any]]:
                """查询单条记录"""
                with self._lock:
                    if table not in self.locks_table:
                        return None

                    for record in self.locks_table[table]:
                        if all(record.get(k) == v for k, v in condition.items()):
                            return record.copy()

                    return None

        class DatabaseLock:
            """数据库分布式锁实现"""

            def __init__(self, db_connection, lock_name: str, timeout: int = 30):
                """
                初始化数据库锁

                Args:
                    db_connection: 数据库连接对象
                    lock_name: 锁名称
                    timeout: 锁超时时间(秒)
                """
                self.db_connection = db_connection
                self.lock_name = lock_name
                self.timeout = timeout
                self.identifier = str(uuid.uuid4())
                self.acquired = False

                # 锁表名
                self.lock_table = "distributed_locks"

                # 创建锁表(如果不存在)
                self._ensure_lock_table_exists()

                # 启动清理线程
                self._start_cleanup_thread()

            def _ensure_lock_table_exists(self):
                """确保锁表存在"""
                # 在实际应用中,这里会执行CREATE TABLE IF NOT EXISTS语句
                if self.lock_table not in self.db_connection.locks_table:
                    self.db_connection.locks_table[self.lock_table] = []

            def _start_cleanup_thread(self):
                """启动过期锁清理线程"""
                def cleanup_expired_locks():
                    while True:
                        try:
                            self._cleanup_expired_locks()
                            time.sleep(60)  # 每分钟清理一次
                        except Exception as e:
                            logging.error(f"清理过期锁时发生错误: {e}")

                cleanup_thread = threading.Thread(target=cleanup_expired_locks, daemon=True)
                cleanup_thread.start()

            def _cleanup_expired_locks(self):
                """清理过期的锁"""
                current_time = datetime.now()
                condition = {"expires_at": ["<", current_time]}

                deleted_count = self.db_connection.execute_delete(
                    self.lock_table,
                    {"expires_at": ["<", current_time]}
                )

                if deleted_count > 0:
                    logging.info(f"清理了 {deleted_count} 个过期锁")

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """
                获取锁

                Args:
                    blocking: 是否阻塞等待
                    timeout: 阻塞等待超时时间

                Returns:
                    bool: 是否成功获取锁
                """
                start_time = time.time()

                while True:
                    try:
                        # 尝试获取锁
                        if self._try_acquire():
                            self.acquired = True
                            logging.info(f"成功获取锁: {self.lock_name}")
                            return True

                        if not blocking:
                            return False

                        # 检查是否超时
                        if timeout and (time.time() - start_time) >= timeout:
                            logging.warning(f"获取锁超时: {self.lock_name}")
                            return False

                        # 短暂等待后重试
                        time.sleep(0.1)

                    except Exception as e:
                        logging.error(f"获取锁时发生错误: {e}")
                        return False

            def _try_acquire(self) -> bool:
                """尝试获取锁(非阻塞)"""
                expires_at = datetime.now() + timedelta(seconds=self.timeout)

                lock_data = {
                    "lock_name": self.lock_name,
                    "identifier": self.identifier,
                    "expires_at": expires_at,
                    "created_at": datetime.now()
                }

                # 尝试插入锁记录(利用数据库的唯一约束)
                success = self.db_connection.execute_insert(self.lock_table, lock_data)

                if success:
                    return True

                # 插入失败,检查锁是否过期
                existing_lock = self.db_connection.query_one(
                    self.lock_table,
                    {"lock_name": self.lock_name}
                )

                if existing_lock and existing_lock["expires_at"] < datetime.now():
                    # 锁已过期,尝试删除并重新获取
                    delete_condition = {
                        "lock_name": self.lock_name,
                        "identifier": existing_lock["identifier"]
                    }

                    deleted = self.db_connection.execute_delete(self.lock_table, delete_condition)

                    if deleted > 0:
                        # 再次尝试获取锁
                        return self.db_connection.execute_insert(self.lock_table, lock_data)

                return False

            def release(self) -> bool:
                """
                释放锁

                Returns:
                    bool: 是否成功释放锁
                """
                if not self.acquired:
                    logging.warning(f"尝试释放未获取的锁: {self.lock_name}")
                    return False

                try:
                    delete_condition = {
                        "lock_name": self.lock_name,
                        "identifier": self.identifier
                    }

                    deleted_count = self.db_connection.execute_delete(
                        self.lock_table,
                        delete_condition
                    )

                    if deleted_count > 0:
                        self.acquired = False
                        logging.info(f"成功释放锁: {self.lock_name}")
                        return True
                    else:
                        logging.warning(f"释放锁失败,锁可能已过期: {self.lock_name}")
                        self.acquired = False
                        return False

                except Exception as e:
                    logging.error(f"释放锁时发生错误: {e}")
                    return False

            def extend(self, additional_time: int = 30) -> bool:
                """
                延长锁的有效期

                Args:
                    additional_time: 延长的时间(秒)

                Returns:
                    bool: 是否成功延长
                """
                if not self.acquired:
                    logging.warning(f"尝试延长未获取的锁: {self.lock_name}")
                    return False

                try:
                    new_expires_at = datetime.now() + timedelta(seconds=additional_time)

                    update_data = {"expires_at": new_expires_at}
                    condition = {
                        "lock_name": self.lock_name,
                        "identifier": self.identifier
                    }

                    updated_count = self.db_connection.execute_update(
                        self.lock_table,
                        condition,
                        update_data
                    )

                    if updated_count > 0:
                        logging.info(f"成功延长锁有效期: {self.lock_name}")
                        return True
                    else:
                        logging.warning(f"延长锁有效期失败: {self.lock_name}")
                        return False

                except Exception as e:
                    logging.error(f"延长锁有效期时发生错误: {e}")
                    return False

            def is_locked(self) -> bool:
                """检查锁是否存在且未过期"""
                try:
                    existing_lock = self.db_connection.query_one(
                        self.lock_table,
                        {"lock_name": self.lock_name}
                    )

                    if existing_lock:
                        return existing_lock["expires_at"] > datetime.now()

                    return False

                except Exception as e:
                    logging.error(f"检查锁状态时发生错误: {e}")
                    return False

            def __enter__(self):
                """上下文管理器入口"""
                self.acquire()
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                """上下文管理器出口"""
                self.release()

            def __del__(self):
                """析构函数,确保锁被释放"""
                if hasattr(self, 'acquired') and self.acquired:
                    self.release()

        # 使用示例
        def database_lock_example():
            """数据库锁使用示例"""
            logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s - %(levelname)s - %(message)s')

            # 创建数据库连接
            db_connection = MockDatabaseConnection()

            # 创建锁实例
            lock = DatabaseLock(db_connection, "resource_lock", timeout=10)

            try:
                # 获取锁
                if lock.acquire(blocking=True, timeout=5):
                    print("成功获取锁,开始执行关键操作...")

                    # 模拟关键操作
                    time.sleep(2)

                    # 检查锁状态
                    print(f"锁状态: {'已锁定' if lock.is_locked() else '未锁定'}")

                    # 延长锁有效期
                    lock.extend(5)
                    print("已延长锁有效期")

                else:
                    print("获取锁失败")

            finally:
                # 释放锁
                if lock.acquired:
                    lock.release()
                    print("锁已释放")

        if __name__ == "__main__":
            database_lock_example()
        ---

03.高级特性与应用
    a.可重入数据库锁
        a.实现原理
            在锁表中记录计数信息,支持同一线程多次获取锁。
        b.计数机制
            维护获取计数器,只有计数器归零时才真正释放锁。
        c.实现方式
            通过递增和递减计数字段来实现重入功能。
    b.公平锁实现
        a.等待队列
            使用数据库表维护锁的等待队列。
        b.FIFO调度
            按照请求顺序分配锁。
        c.饥饿预防
            防止某些请求永远无法获取锁。
    c.代码示例
        ---
        # 高级数据库锁实现
        import threading
        import time
        import logging
        from datetime import datetime, timedelta
        from typing import Optional, Dict, Any, List
        import uuid
        from queue import Queue
        import heapq

        class AdvancedDatabaseLock:
            """高级数据库锁实现(支持可重入和公平性)"""

            def __init__(self, db_connection, lock_name: str, timeout: int = 30,
                        fair: bool = True, reentrant: bool = True):
                """
                初始化高级数据库锁

                Args:
                    db_connection: 数据库连接
                    lock_name: 锁名称
                    timeout: 默认超时时间
                    fair: 是否使用公平锁
                    reentrant: 是否支持可重入
                """
                self.db_connection = db_connection
                self.lock_name = lock_name
                self.timeout = timeout
                self.fair = fair
                self.reentrant = reentrant
                self.identifier = str(uuid.uuid4())
                self.thread_id = threading.get_ident()
                self.acquired = False
                self.acquire_count = 0
                self.lock_table = "distributed_locks"
                self.wait_queue_table = "lock_wait_queue"

                # 确保表存在
                self._ensure_tables_exist()

                # 启动维护线程
                self._start_maintenance_threads()

            def _ensure_tables_exist(self):
                """确保必要的表存在"""
                # 锁表
                if self.lock_table not in self.db_connection.locks_table:
                    self.db_connection.locks_table[self.lock_table] = []

                # 等待队列表
                if self.wait_queue_table not in self.db_connection.locks_table:
                    self.db_connection.locks_table[self.wait_queue_table] = []

            def _start_maintenance_threads(self):
                """启动维护线程"""
                def maintenance_worker():
                    while True:
                        try:
                            self._cleanup_expired_entries()
                            self._process_wait_queue()
                            time.sleep(5)  # 每5秒执行一次维护
                        except Exception as e:
                            logging.error(f"维护线程异常: {e}")

                maintenance_thread = threading.Thread(target=maintenance_worker, daemon=True)
                maintenance_thread.start()

            def _cleanup_expired_entries(self):
                """清理过期的锁和等待队列条目"""
                current_time = datetime.now()

                # 清理过期锁
                self.db_connection.execute_delete(
                    self.lock_table,
                    {"expires_at": ["<", current_time]}
                )

                # 清理过期等待队列条目
                self.db_connection.execute_delete(
                    self.wait_queue_table,
                    {"expires_at": ["<", current_time]}
                )

            def _process_wait_queue(self(self):
                """处理等待队列"""
                if not self.fair:
                    return

                # 获取当前锁持有者
                current_lock = self.db_connection.query_one(
                    self.lock_table,
                    {"lock_name": self.lock_name}
                )

                if current_lock:
                    return  # 锁仍被持有

                # 获取等待队列中的第一个请求
                wait_requests = self.db_connection.query_all(
                    self.wait_queue_table,
                    {"lock_name": self.lock_name},
                    order_by="created_at"
                )

                if wait_requests:
                    first_request = wait_requests[0]
                    self._grant_lock_to_waiter(first_request)

            def _grant_lock_to_waiter(self, request):
                """将锁授予等待者"""
                expires_at = datetime.now() + timedelta(seconds=request.get("timeout", self.timeout))

                lock_data = {
                    "lock_name": self.lock_name,
                    "identifier": request["identifier"],
                    "thread_id": request["thread_id"],
                    "acquire_count": 1,
                    "expires_at": expires_at,
                    "created_at": datetime.now()
                }

                # 创建锁记录
                if self.db_connection.execute_insert(self.lock_table, lock_data):
                    # 从等待队列中移除
                    self.db_connection.execute_delete(
                        self.wait_queue_table,
                        {"id": request["id"]}
                    )

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """
                获取锁

                Args:
                    blocking: 是否阻塞等待
                    timeout: 阻塞等待超时时间

                Returns:
                    bool: 是否成功获取锁
                """
                if self.reentrant and self.acquired and self.identifier == self._get_current_lock_identifier():
                    # 可重入锁,同一线程可多次获取
                    self._increment_acquire_count()
                    return True

                start_time = time.time()
                effective_timeout = timeout if timeout is not None else self.timeout

                # 如果使用公平锁,先加入等待队列
                if self.fair and blocking:
                    self._add_to_wait_queue(effective_timeout)

                while True:
                    try:
                        if self._try_acquire():
                            self.acquired = True
                            self.acquire_count = 1

                            # 如果使用公平锁,从等待队列中移除
                            if self.fair:
                                self._remove_from_wait_queue()

                            logging.info(f"成功获取锁: {self.lock_name}")
                            return True

                        if not blocking:
                            return False

                        # 检查超时
                        if (time.time() - start_time) >= effective_timeout:
                            if self.fair:
                                self._remove_from_wait_queue()
                            logging.warning(f"获取锁超时: {self.lock_name}")
                            return False

                        time.sleep(0.1)

                    except Exception as e:
                        if self.fair:
                            self._remove_from_wait_queue()
                        logging.error(f"获取锁时发生错误: {e}")
                        return False

            def _add_to_wait_queue(self, timeout: float):
                """添加到等待队列"""
                wait_data = {
                    "lock_name": self.lock_name,
                    "identifier": self.identifier,
                    "thread_id": self.thread_id,
                    "created_at": datetime.now(),
                    "expires_at": datetime.now() + timedelta(seconds=timeout)
                }

                self.db_connection.execute_insert(self.wait_queue_table, wait_data)

            def _remove_from_wait_queue(self):
                """从等待队列中移除"""
                condition = {
                    "lock_name": self.lock_name,
                    "identifier": self.identifier
                }
                self.db_connection.execute_delete(self.wait_queue_table, condition)

            def _get_current_lock_identifier(self):
                """获取当前锁的标识符"""
                current_lock = self.db_connection.query_one(
                    self.lock_table,
                    {"lock_name": self.lock_name}
                )
                return current_lock["identifier"] if current_lock else None

            def _try_acquire(self) -> bool:
                """尝试获取锁"""
                expires_at = datetime.now() + timedelta(seconds=self.timeout)

                if self.reentrant:
                    lock_data = {
                        "lock_name": self.lock_name,
                        "identifier": self.identifier,
                        "thread_id": self.thread_id,
                        "acquire_count": 1,
                        "expires_at": expires_at
                    }
                else:
                    lock_data = {
                        "lock_name": self.lock_name,
                        "identifier": self.identifier,
                        "expires_at": expires_at
                    }

                # 尝试创建锁记录
                if self.db_connection.execute_insert(self.lock_table, lock_data):
                    return True

                # 创建失败,检查现有锁
                existing_lock = self.db_connection.query_one(
                    self.lock_table,
                    {"lock_name": self.lock_name}
                )

                if not existing_lock:
                    return False

                # 检查锁是否过期
                if existing_lock["expires_at"] < datetime.now():
                    # 锁已过期,尝试删除并重新获取
                    delete_condition = {
                        "lock_name": self.lock_name,
                        "identifier": existing_lock["identifier"]
                    }

                    deleted = self.db_connection.execute_delete(self.lock_table, delete_condition)

                    if deleted > 0:
                        return self.db_connection.execute_insert(self.lock_table, lock_data)

                # 检查是否为可重入锁的同一线程
                if (self.reentrant and
                    existing_lock["thread_id"] == self.thread_id and
                    existing_lock["identifier"] == self.identifier):
                    # 增加获取计数
                    new_count = existing_lock["acquire_count"] + 1
                    update_data = {
                        "acquire_count": new_count,
                        "expires_at": expires_at
                    }

                    condition = {
                        "lock_name": self.lock_name,
                        "identifier": self.identifier
                    }

                    updated = self.db_connection.execute_update(self.lock_table, condition, update_data)
                    return updated > 0

                return False

            def _increment_acquire_count(self):
                """增加获取计数"""
                if not self.acquired:
                    return

                expires_at = datetime.now() + timedelta(seconds=self.timeout)
                update_data = {
                    "acquire_count": self.acquire_count + 1,
                    "expires_at": expires_at
                }

                condition = {
                    "lock_name": self.lock_name,
                    "identifier": self.identifier
                }

                self.db_connection.execute_update(self.lock_table, condition, update_data)
                self.acquire_count += 1

            def release(self) -> bool:
                """释放锁"""
                if not self.acquired:
                    logging.warning(f"尝试释放未获取的锁: {self.lock_name}")
                    return False

                try:
                    if self.reentrant and self.acquire_count > 1:
                        # 可重入锁,减少计数
                        self.acquire_count -= 1
                        expires_at = datetime.now() + timedelta(seconds=self.timeout)

                        update_data = {
                            "acquire_count": self.acquire_count,
                            "expires_at": expires_at
                        }

                        condition = {
                            "lock_name": self.lock_name,
                            "identifier": self.identifier
                        }

                        updated = self.db_connection.execute_update(self.lock_table, condition, update_data)
                        if updated > 0:
                            logging.info(f"减少锁获取计数: {self.lock_name} (count: {self.acquire_count})")
                            return True

                    # 完全释放锁
                    delete_condition = {
                        "lock_name": self.lock_name,
                        "identifier": self.identifier
                    }

                    deleted = self.db_connection.execute_delete(self.lock_table, delete_condition)

                    if deleted > 0:
                        self.acquired = False
                        self.acquire_count = 0
                        logging.info(f"成功释放锁: {self.lock_name}")
                        return True
                    else:
                        logging.warning(f"释放锁失败: {self.lock_name}")
                        self.acquired = False
                        self.acquire_count = 0
                        return False

                except Exception as e:
                    logging.error(f"释放锁时发生错误: {e}")
                    return False

        # 使用示例
        def advanced_database_lock_example():
            """高级数据库锁使用示例"""
            logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s - %(levelname)s - %(message)s')

            db_connection = MockDatabaseConnection()

            def worker_with_reentrant_lock(worker_id: int):
                """使用可重入锁的工作线程"""
                lock = AdvancedDatabaseLock(
                    db_connection,
                    f"resource_{worker_id}",
                    timeout=10,
                    fair=False,  # 不使用公平锁以提高性能
                    reentrant=True  # 启用可重入
                )

                def nested_operation():
                    """嵌套操作"""
                    with lock:
                        print(f"Worker-{worker_id}: 执行嵌套操作")
                        time.sleep(1)

                with lock:
                    print(f"Worker-{worker_id}: 获取锁,开始主要操作")
                    nested_operation()  # 嵌套获取锁
                    print(f"Worker-{worker_id}: 完成所有操作")

            def worker_with_fair_lock(worker_id: int):
                """使用公平锁的工作线程"""
                lock = AdvancedDatabaseLock(
                    db_connection,
                    "shared_resource",
                    timeout=5,
                    fair=True,   # 使用公平锁
                    reentrant=False  # 不需要可重入
                )

                with lock:
                    print(f"Worker-{worker_id}: 获取公平锁,执行操作")
                    time.sleep(2)

            # 创建并启动线程
            threads = []

            # 可重入锁测试
            for i in range(3):
                thread = threading.Thread(target=worker_with_reentrant_lock, args=(i,))
                threads.append(thread)
                thread.start()

            # 公平锁测试
            for i in range(3, 6):
                thread = threading.Thread(target=worker_with_fair_lock, args=(i,))
                threads.append(thread)
                thread.start()

            # 等待所有线程完成
            for thread in threads:
                thread.join()

        if __name__ == "__main__":
            advanced_database_lock_example()
        ---

04.实际应用场景
    a.分布式任务调度
        a.场景描述
            在分布式环境中协调任务执行,避免重复执行。
        b.解决方案
            使用数据库锁确保同一时间只有一个节点执行特定任务。
        c.代码示例
            ---
            class DistributedTaskScheduler:
                """基于数据库锁的分布式任务调度器"""

                def __init__(self, db_connection):
                    self.db_connection = db_connection
                    self.tasks_table = "scheduled_tasks"
                    self.task_locks_table = "task_execution_locks"
                    self._ensure_tables_exist()

                def schedule_task(self, task_name: str, task_data: Dict[str, Any],
                                execute_after: Optional[datetime] = None) -> bool:
                    """调度任务"""
                    task_record = {
                        "task_name": task_name,
                        "task_data": task_data,
                        "status": "scheduled",
                        "created_at": datetime.now(),
                        "execute_after": execute_after or datetime.now(),
                        "retry_count": 0,
                        "max_retries": 3
                    }

                    return self.db_connection.execute_insert(self.tasks_table, task_record)

                def execute_task(self, task_name: str) -> bool:
                    """执行任务(分布式安全)"""
                    # 创建任务执行锁
                    task_lock = DatabaseLock(
                        self.db_connection,
                        f"task_{task_name}",
                        timeout=300  # 5分钟超时
                    )

                    if not task_lock.acquire(blocking=False):
                        logging.info(f"任务 {task_name} 正在被其他节点执行")
                        return False

                    try:
                        # 查找待执行的任务
                        task = self._get_next_pending_task(task_name)
                        if not task:
                            logging.info(f"没有找到待执行的任务 {task_name}")
                            return True

                        # 更新任务状态
                        self._update_task_status(task["id"], "running")

                        # 执行任务
                        success = self._run_task(task)

                        # 更新任务结果
                        if success:
                            self._update_task_status(task["id"], "completed")
                        else:
                            self._handle_task_failure(task)

                        return success

                    finally:
                        task_lock.release()

                def _get_next_pending_task(self, task_name: str) -> Optional[Dict[str, Any]]:
                    """获取下一个待执行的任务"""
                    current_time = datetime.now()
                    condition = {
                        "task_name": task_name,
                        "status": "scheduled",
                        "execute_after": ["<=", current_time]
                    }

                    return self.db_connection.query_one(
                        self.tasks_table,
                        condition,
                        order_by="created_at"
                    )

                def _update_task_status(self, task_id: str, status: str):
                    """更新任务状态"""
                    update_data = {
                        "status": status,
                        "updated_at": datetime.now()
                    }

                    if status == "running":
                        update_data["started_at"] = datetime.now()
                    elif status in ["completed", "failed"]:
                        update_data["finished_at"] = datetime.now()

                    condition = {"id": task_id}
                    self.db_connection.execute_update(self.tasks_table, condition, update_data)

                def _run_task(self, task: Dict[str, Any]) -> bool:
                    """执行具体任务"""
                    task_name = task["task_name"]
                    task_data = task["task_data"]

                    logging.info(f"开始执行任务: {task_name}")

                    try:
                        # 这里根据任务类型执行具体逻辑
                        if task_name == "data_backup":
                            return self._execute_data_backup(task_data)
                        elif task_name == "report_generation":
                            return self._execute_report_generation(task_data)
                        elif task_name == "cache_refresh":
                            return self._execute_cache_refresh(task_data)
                        else:
                            logging.warning(f"未知的任务类型: {task_name}")
                            return False

                    except Exception as e:
                        logging.error(f"任务执行失败: {e}")
                        return False

                def _execute_data_backup(self, task_data: Dict[str, Any]) -> bool:
                    """执行数据备份任务"""
                    logging.info("执行数据备份...")
                    # 模拟备份操作
                    time.sleep(10)
                    logging.info("数据备份完成")
                    return True

                def _execute_report_generation(self, task_data: Dict[str, Any]) -> bool:
                    """执行报告生成任务"""
                    logging.info("生成报告...")
                    # 模拟报告生成
                    time.sleep(5)
                    logging.info("报告生成完成")
                    return True

                def _execute_cache_refresh(self, task_data: Dict[str, Any]) -> bool:
                    """执行缓存刷新任务"""
                    logging.info("刷新缓存...")
                    # 模拟缓存刷新
                    time.sleep(2)
                    logging.info("缓存刷新完成")
                    return True
            ---
    b.数据库迁移管理
        a.场景描述
            在分布式系统中管理数据库模式迁移,确保迁移操作的原子性。
        b.解决方案
            使用数据库锁协调多个节点的迁移操作。
        c.代码示例
            ---
            class DatabaseMigrationManager:
                """数据库迁移管理器"""

                def __init__(self, db_connection):
                    self.db_connection = db_connection
                    self.migrations_table = "schema_migrations"
                    self.migration_lock_table = "migration_locks"
                    self._ensure_tables_exist()

                def run_migrations(self, migration_scripts: List[str]) -> bool:
                    """运行数据库迁移"""
                    # 获取迁移锁
                    migration_lock = DatabaseLock(
                        self.db_connection,
                        "database_migration",
                        timeout=3600  # 1小时超时
                    )

                    if not migration_lock.acquire(blocking=True, timeout=60):
                        logging.error("无法获取迁移锁,可能有其他节点正在执行迁移")
                        return False

                    try:
                        logging.info("开始数据库迁移...")

                        for script in migration_scripts:
                            if not self._run_migration(script):
                                logging.error(f"迁移脚本 {script} 执行失败")
                                return False

                        logging.info("数据库迁移完成")
                        return True

                    finally:
                        migration_lock.release()

                def _run_migration(self, script_name: str) -> bool:
                    """运行单个迁移脚本"""
                    # 检查是否已执行
                    if self._is_migration_applied(script_name):
                        logging.info(f"迁移 {script_name} 已应用,跳过")
                        return True

                    try:
                        # 记录迁移开始
                        self._record_migration_start(script_name)

                        # 执行迁移SQL
                        migration_sql = self._get_migration_sql(script_name)
                        self.db_connection.execute_sql(migration_sql)

                        # 记录迁移完成
                        self._record_migration_success(script_name)

                        logging.info(f"迁移 {script_name} 执行成功")
                        return True

                    except Exception as e:
                        self._record_migration_failure(script_name, str(e))
                        logging.error(f"迁移 {script_name} 执行失败: {e}")
                        return False

                def _is_migration_applied(self, script_name: str) -> bool:
                    """检查迁移是否已应用"""
                    migration = self.db_connection.query_one(
                        self.migrations_table,
                        {"script_name": script_name, "status": "success"}
                    )
                    return migration is not None
            ---
    c.分布式缓存一致性
        a.场景描述
            确保多个节点间缓存数据的一致性,避免缓存污染。
        b.解决方案
            使用数据库锁协调缓存更新操作。
        c.代码示例
            ---
            class DistributedCacheManager:
                """分布式缓存管理器"""

                def __init__(self, db_connection):
                    self.db_connection = db_connection
                    self.cache_table = "cache_entries"
                    self.cache_lock_table = "cache_update_locks"
                    self._ensure_tables_exist()

                def update_cache(self, key: str, value: Any, ttl: int = 3600) -> bool:
                    """更新缓存(分布式安全)"""
                    # 获取缓存更新锁
                    cache_lock = DatabaseLock(
                        self.db_connection,
                        f"cache_update_{key}",
                        timeout=30
                    )

                    if not cache_lock.acquire(blocking=True, timeout=5):
                        logging.warning(f"无法获取缓存更新锁: {key}")
                        return False

                    try:
                        # 更新缓存记录
                        cache_data = {
                            "cache_key": key,
                            "cache_value": value,
                            "expires_at": datetime.now() + timedelta(seconds=ttl),
                            "updated_at": datetime.now(),
                            "updated_by": threading.get_ident()
                        }

                        # 使用 UPSERT 操作
                        condition = {"cache_key": key}
                        existing = self.db_connection.query_one(self.cache_table, condition)

                        if existing:
                            self.db_connection.execute_update(self.cache_table, condition, cache_data)
                        else:
                            cache_data["created_at"] = datetime.now()
                            self.db_connection.execute_insert(self.cache_table, cache_data)

                        logging.info(f"缓存更新成功: {key}")
                        return True

                    finally:
                        cache_lock.release()

                def get_cache(self, key: str) -> Optional[Any]:
                    """获取缓存值"""
                    condition = {"cache_key": key}
                    cache_entry = self.db_connection.query_one(self.cache_table, condition)

                    if cache_entry and cache_entry["expires_at"] > datetime.now():
                        return cache_entry["cache_value"]

                    return None

                def invalidate_cache(self, key: str) -> bool:
                    """失效缓存"""
                    # 获取缓存失效锁
                    cache_lock = DatabaseLock(
                        self.db_connection,
                        f"cache_invalidate_{key}",
                        timeout=10
                    )

                    if not cache_lock.acquire(blocking=True, timeout=5):
                        logging.warning(f"无法获取缓存失效锁: {key}")
                        return False

                    try:
                        condition = {"cache_key": key}
                        deleted = self.db_connection.execute_delete(self.cache_table, condition)

                        if deleted > 0:
                            logging.info(f"缓存失效成功: {key}")
                            return True
                        else:
                            logging.info(f"缓存不存在或已失效: {key}")
                            return True

                    finally:
                        cache_lock.release()
            ---

05.最佳实践与性能优化
    a.设计原则
        a.最小锁粒度
            只锁定必要的资源,减少锁竞争。
        b.短锁持有时间
            尽量减少锁的持有时间,提高并发性。
        c.异常安全
            确保在异常情况下锁能正确释放。
        d.超时机制
            设置合理的超时时间,防止死锁。
    b.性能优化策略
        a.索引优化
            在锁表上创建合适的索引,提高查询性能。
        b.连接池管理
            使用数据库连接池,减少连接开销。
        c.批量操作
            合并多个锁操作,减少数据库交互。
        d.异步处理
            对于非关键操作,使用异步方式处理。
    c.监控与调试
        a.锁状态监控
            监控锁的获取、释放和等待情况。
        b.性能指标
            跟踪锁操作的延迟和吞吐量。
        c.异常告警
            设置锁相关异常的告警机制。
        d.日志记录
            详细记录锁操作日志,便于问题诊断。
    d.常见问题与解决方案
        a.死锁预防
            建立锁获取顺序,避免循环等待。
        b.锁泄漏处理
            实现自动清理机制,处理未正确释放的锁。
        c.性能瓶颈分析
            识别和优化锁操作的性能瓶颈。
        d.高可用保障
            设计故障转移和恢复机制。

9.3 ZooKeeper锁

01.基本概念
    a.定义与作用
        ZooKeeper分布式锁是基于Apache ZooKeeper协调服务实现的分布式锁机制,利用ZooKeeper的强一致性、临时节点和监听器特性来实现可靠的分布式锁。
    b.核心特性
        a.强一致性
            ZooKeeper提供顺序一致性和原子性保证,确保锁操作的可靠性。
        b.临时节点
            利用临时节点的自动清理机制,防止死锁和锁泄漏。
        c.监听机制
            通过Watch机制实现锁释放的实时通知。
        d.顺序保证
            使用顺序节点实现公平锁,按照请求顺序分配锁。
    c.与其它分布式锁的比较
        a.可靠性特点
            相比Redis和数据库,提供更强的可靠性保证。
        b.性能特点
            性能相对较低,但提供了完整的分布式协调能力。
        c.适用场景
            适用于对可靠性要求极高的分布式系统。

02.基本使用
    a.ZooKeeper基础操作
        a.连接建立
            创建ZooKeeper客户端连接,处理会话管理。
        b.节点操作
            创建、删除、读取ZooKeeper节点的操作。
        c.监听器设置
            设置节点变化监听器,获取实时通知。
    b.锁实现原理
        a.临时顺序节点
            创建临时顺序节点来标识锁请求。
        b.最小节点判断
            通过检查是否为最小节点来确定是否获取锁。
        c.监听前驱节点
            监听前一个节点的删除事件,等待锁释放。
    c.代码示例
        ---
        # ZooKeeper分布式锁基本实现
        import time
        import threading
        import logging
        from typing import Optional, Callable, Any
        import uuid
        from datetime import datetime
        from kazoo.client import KazooClient
        from kazoo.exceptions import KazooException, NoNodeError, NodeExistsError
        from kazoo.recipe.lock import Lock as ZookeeperLock

        class ZookeeperDistributedLock:
            """基于ZooKeeper的分布式锁实现"""

            def __init__(self, hosts: str, lock_path: str, timeout: int = 30):
                """
                初始化ZooKeeper锁

                Args:
                    hosts: ZooKeeper服务器地址列表
                    lock_path: 锁的ZooKeeper路径
                    timeout: 锁超时时间(秒)
                """
                self.hosts = hosts
                self.lock_path = lock_path
                self.timeout = timeout
                self.identifier = str(uuid.uuid4())
                self.acquired = False

                # 确保锁路径以/结尾
                if not self.lock_path.endswith('/'):
                    self.lock_path += '/'

                # 初始化ZooKeeper客户端
                self.zk = None
                self.lock_node = None
                self.lock = None

                # 连接状态
                self.connected = False

                # 启动连接
                self._connect()

            def _connect(self):
                """连接到ZooKeeper"""
                try:
                    self.zk = KazooClient(
                        hosts=self.hosts,
                        timeout=10,
                        connection_retry=dict(max_delay=5, max_tries=10)
                    )

                    self.zk.start()

                    # 等待连接建立
                    if self.zk.connected:
                        self.connected = True
                        logging.info(f"成功连接到ZooKeeper: {self.hosts}")

                        # 确保锁路径存在
                        self.zk.ensure_path(self.lock_path)

                    else:
                        raise ConnectionError("无法连接到ZooKeeper")

                except Exception as e:
                    logging.error(f"连接ZooKeeper失败: {e}")
                    raise

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """
                获取锁

                Args:
                    blocking: 是否阻塞等待
                    timeout: 阻塞等待超时时间

                Returns:
                    bool: 是否成功获取锁
                """
                if not self.connected:
                    logging.error("ZooKeeper连接未建立,无法获取锁")
                    return False

                if self.acquired:
                    logging.warning("锁已被获取,重复获取")
                    return True

                try:
                    # 使用ZooKeeper的Lock配方
                    lock_name = self.lock_path.rstrip('/') + "_lock"
                    self.lock = ZookeeperLock(
                        self.zk,
                        lock_name,
                        identifier=self.identifier,
                        max_leasing_time=self.timeout
                    )

                    # 获取锁
                    success = self.lock.acquire(timeout=timeout if timeout else self.timeout)

                    if success:
                        self.acquired = True
                        logging.info(f"成功获取ZooKeeper锁: {self.lock_path}")
                        return True
                    else:
                        if blocking:
                            logging.warning(f"获取ZooKeeper锁超时: {self.lock_path}")
                        else:
                            logging.info(f"非阻塞模式获取ZooKeeper锁失败: {self.lock_path}")
                        return False

                except Exception as e:
                    logging.error(f"获取ZooKeeper锁时发生错误: {e}")
                    return False

            def release(self) -> bool:
                """
                释放锁

                Returns:
                    bool: 是否成功释放锁
                """
                if not self.acquired:
                    logging.warning(f"尝试释放未获取的锁: {self.lock_path}")
                    return False

                try:
                    if self.lock:
                        self.lock.release()
                        self.lock = None

                    self.acquired = False
                    logging.info(f"成功释放ZooKeeper锁: {self.lock_path}")
                    return True

                except Exception as e:
                    logging.error(f"释放ZooKeeper锁时发生错误: {e}")
                    return False

            def is_locked(self) -> bool:
                """检查锁是否存在"""
                if not self.connected or not self.acquired:
                    return False

                try:
                    lock_name = self.lock_path.rstrip('/') + "_lock"
                    return self.zk.exists(lock_name) is not None

                except Exception as e:
                    logging.error(f"检查锁状态时发生错误: {e}")
                    return False

            def close(self):
                """关闭连接"""
                try:
                    if self.acquired:
                        self.release()

                    if self.zk:
                        self.zk.stop()
                        self.zk.close()
                        self.zk = None

                    self.connected = False
                    logging.info("ZooKeeper连接已关闭")

                except Exception as e:
                    logging.error(f"关闭ZooKeeper连接时发生错误: {e}")

            def __enter__(self):
                """上下文管理器入口"""
                self.acquire()
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                """上下文管理器出口"""
                self.release()

            def __del__(self):
                """析构函数,确保资源清理"""
                self.close()

        # 手动实现ZooKeeper锁(不使用内置配方)
        class ManualZookeeperLock:
            """手动实现的ZooKeeper分布式锁"""

            def __init__(self, hosts: str, lock_path: str, timeout: int = 30):
                """
                初始化手动ZooKeeper锁

                Args:
                    hosts: ZooKeeper服务器地址
                    lock_path: 锁路径
                    timeout: 超时时间
                """
                self.hosts = hosts
                self.lock_path = lock_path
                self.timeout = timeout
                self.identifier = str(uuid.uuid4())
                self.acquired = False

                # 确保路径以/结尾
                if not self.lock_path.endswith('/'):
                    self.lock_path += '/'

                # ZooKeeper客户端
                self.zk = None
                self.lock_node_path = None
                self.connected = False

                # 连接ZooKeeper
                self._connect()

            def _connect(self):
                """连接到ZooKeeper"""
                try:
                    self.zk = KazooClient(
                        hosts=self.hosts,
                        timeout=10,
                        connection_retry=dict(max_delay=5, max_tries=10)
                    )
                    self.zk.start()

                    if self.zk.connected:
                        self.connected = True
                        logging.info(f"成功连接到ZooKeeper: {self.hosts}")

                        # 确保锁路径存在
                        self.zk.ensure_path(self.lock_path)
                    else:
                        raise ConnectionError("无法连接到ZooKeeper")

                except Exception as e:
                    logging.error(f"连接ZooKeeper失败: {e}")
                    raise

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """
                获取锁

                Args:
                    blocking: 是否阻塞等待
                    timeout: 阻塞等待超时时间

                Returns:
                    bool: 是否成功获取锁
                """
                if not self.connected:
                    logging.error("ZooKeeper连接未建立")
                    return False

                if self.acquired:
                    return True

                start_time = time.time()
                effective_timeout = timeout if timeout is not None else self.timeout

                try:
                    # 创建临时顺序节点
                    self.lock_node_path = self.zk.create(
                        self.lock_path + "lock-",
                        value=self.identifier.encode('utf-8'),
                        ephemeral=True,
                        sequence=True
                    )

                    logging.info(f"创建锁节点: {self.lock_node_path}")

                    while True:
                        try:
                            # 获取所有锁节点
                            children = self.zk.get_children(self.lock_path)
                            lock_nodes = sorted(child for child in children if child.startswith("lock-"))

                            if not lock_nodes:
                                logging.warning("未找到锁节点")
                                return False

                            # 检查当前节点是否为最小节点
                            current_node_name = self.lock_node_path.split('/')[-1]
                            is_smallest = current_node_name == lock_nodes[0]

                            if is_smallest:
                                # 获取锁成功
                                self.acquired = True
                                logging.info(f"成功获取锁: {self.lock_path}")
                                return True

                            else:
                                # 监听前一个节点
                                current_index = lock_nodes.index(current_node_name)
                                if current_index > 0:
                                    prev_node_name = lock_nodes[current_index - 1]
                                    prev_node_path = self.lock_path + prev_node_name

                                    # 设置监听器
                                    self._watch_previous_node(prev_node_path)

                                if not blocking:
                                    # 非阻塞模式,删除节点并返回
                                    self._delete_lock_node()
                                    return False

                                # 检查超时
                                if (time.time() - start_time) >= effective_timeout:
                                    logging.warning(f"获取锁超时: {self.lock_path}")
                                    self._delete_lock_node()
                                    return False

                                # 等待一段时间后重试
                                time.sleep(0.1)

                        except Exception as e:
                            logging.error(f"获取锁过程中发生错误: {e}")
                            self._delete_lock_node()
                            return False

                except Exception as e:
                    logging.error(f"获取锁时发生错误: {e}")
                    self._delete_lock_node()
                    return False

            def _watch_previous_node(self, node_path: str):
                """监听前一个节点"""
                try:
                    @self.zk.DataWatch(node_path)
                    def watch_node(data, stat):
                        if data is None or stat is None:
                            # 前一个节点被删除,可以尝试获取锁
                            logging.info(f"前驱节点被删除: {node_path}")
                            return False

                except Exception as e:
                    logging.error(f"设置节点监听器失败: {e}")

            def _delete_lock_node(self):
                """删除锁节点"""
                if self.lock_node_path:
                    try:
                        self.zk.delete(self.lock_node_path)
                        logging.info(f"删除锁节点: {self.lock_node_path}")
                    except NoNodeError:
                        pass  # 节点已被删除
                    except Exception as e:
                        logging.error(f"删除锁节点失败: {e}")
                    finally:
                        self.lock_node_path = None

            def release(self) -> bool:
                """释放锁"""
                if not self.acquired:
                    logging.warning(f"尝试释放未获取的锁: {self.lock_path}")
                    return False

                try:
                    self._delete_lock_node()
                    self.acquired = False
                    logging.info(f"成功释放锁: {self.lock_path}")
                    return True

                except Exception as e:
                    logging.error(f"释放锁时发生错误: {e}")
                    return False

            def close(self):
                """关闭连接"""
                try:
                    if self.acquired:
                        self.release()

                    if self.zk:
                        self.zk.stop()
                        self.zk.close()
                        self.zk = None

                    self.connected = False
                    logging.info("ZooKeeper连接已关闭")

                except Exception as e:
                    logging.error(f"关闭ZooKeeper连接时发生错误: {e}")

            def __enter__(self):
                """上下文管理器入口"""
                self.acquire()
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                """上下文管理器出口"""
                self.release()

            def __del__(self):
                """析构函数"""
                self.close()

        # 使用示例
        def zookeeper_lock_example():
            """ZooKeeper锁使用示例"""
            logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s - %(levelname)s - %(message)s')

            # 配置ZooKeeper服务器地址
            zk_hosts = "localhost:2181"  # 实际项目中替换为真实的ZooKeeper地址

            try:
                # 使用内置配方的锁
                lock = ZookeeperDistributedLock(
                    hosts=zk_hosts,
                    lock_path="/distributed_locks/resource1",
                    timeout=10
                )

                with lock:
                    print("成功获取ZooKeeper锁,开始执行关键操作...")
                    time.sleep(5)
                    print("关键操作完成")

                # 使用手动实现的锁
                manual_lock = ManualZookeeperLock(
                    hosts=zk_hosts,
                    lock_path="/distributed_locks/resource2",
                    timeout=15
                )

                if manual_lock.acquire(blocking=True, timeout=10):
                    try:
                        print("成功获取手动ZooKeeper锁...")
                        time.sleep(3)
                    finally:
                        manual_lock.release()

            except Exception as e:
                print(f"ZooKeeper锁示例执行失败: {e}")

        if __name__ == "__main__":
            zookeeper_lock_example()
        ---

03.高级特性与应用
    a.读写锁实现
        a.实现原理
            使用不同的ZooKeeper节点来区分读锁和写锁。
        b.读锁规则
            多个读锁可以同时存在,但需要等待写锁释放。
        c.写锁规则
            写锁需要等待所有读锁和其他写锁释放。
        d.优先级管理
            读锁和写锁的获取优先级策略。
    b.可重入锁实现
        a.锁计数机制
            在节点数据中记录锁的获取计数。
        b.锁拥有者识别
            通过客户端标识符判断是否为同一客户端。
        c.锁继承机制
            支持同一客户端多次获取同一锁。
    c.锁租约管理
        a.租约自动续期
            定期更新节点数据延长锁的有效期。
        b.租约过期检测
            检测客户端连接状态,处理异常情况。
        c.故障恢复机制
            在客户端故障时自动释放锁。
    d.代码示例
        ---
        # 高级ZooKeeper锁实现
        import time
        import threading
        import logging
        from typing import Optional, Dict, Any, Set
        import uuid
        from datetime import datetime, timedelta
        from kazoo.client import KazooClient
        from kazoo.exceptions import KazooException, NoNodeError
        from enum import Enum

        class LockType(Enum):
            """锁类型枚举"""
            READ = "read"
            WRITE = "write"

        class ZookeeperReadWriteLock:
            """基于ZooKeeper的读写锁实现"""

            def __init__(self, hosts: str, lock_path: str, timeout: int = 30):
                """
                初始化读写锁

                Args:
                    hosts: ZooKeeper服务器地址
                    lock_path: 锁路径
                    timeout: 超时时间
                """
                self.hosts = hosts
                self.lock_path = lock_path.rstrip('/') + '/'
                self.timeout = timeout
                self.identifier = str(uuid.uuid4())
                self.thread_id = threading.get_ident()

                # 锁状态
                self.acquired_locks = {}  # 记录已获取的锁
                self.connected = False

                # ZooKeeper客户端
                self.zk = None

                # 连接ZooKeeper
                self._connect()

            def _connect(self):
                """连接到ZooKeeper"""
                try:
                    self.zk = KazooClient(
                        hosts=self.hosts,
                        timeout=10,
                        connection_retry=dict(max_delay=5, max_tries=10)
                    )
                    self.zk.start()

                    if self.zk.connected:
                        self.connected = True
                        logging.info(f"成功连接到ZooKeeper: {self.hosts}")

                        # 创建必要的路径
                        self.zk.ensure_path(self.lock_path + "read_locks/")
                        self.zk.ensure_path(self.lock_path + "write_locks/")
                    else:
                        raise ConnectionError("无法连接到ZooKeeper")

                except Exception as e:
                    logging.error(f"连接ZooKeeper失败: {e}")
                    raise

            def acquire_read_lock(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """获取读锁"""
                return self._acquire_lock(LockType.READ, blocking, timeout)

            def acquire_write_lock(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """获取写锁"""
                return self._acquire_lock(LockType.WRITE, blocking, timeout)

            def _acquire_lock(self, lock_type: LockType, blocking: bool = True,
                            timeout: Optional[float] = None) -> bool:
                """获取锁的通用方法"""
                if not self.connected:
                    logging.error("ZooKeeper连接未建立")
                    return False

                # 检查是否已持有相同类型的锁(可重入)
                lock_key = f"{lock_type.value}_{self.identifier}_{self.thread_id}"
                if lock_key in self.acquired_locks:
                    # 增加重入计数
                    self.acquired_locks[lock_key] += 1
                    logging.info(f"重入获取{lock_type.value}锁: {self.lock_path}")
                    return True

                start_time = time.time()
                effective_timeout = timeout if timeout is not None else self.timeout

                try:
                    while True:
                        # 检查锁获取条件
                        can_acquire = self._can_acquire_lock(lock_type)

                        if can_acquire:
                            # 创建锁节点
                            lock_node_path = self._create_lock_node(lock_type)
                            if lock_node_path:
                                self.acquired_locks[lock_key] = 1
                                logging.info(f"成功获取{lock_type.value}锁: {lock_node_path}")
                                return True

                        if not blocking:
                            return False

                        # 检查超时
                        if (time.time() - start_time) >= effective_timeout:
                            logging.warning(f"获取{lock_type.value}锁超时: {self.lock_path}")
                            return False

                        # 等待并重试
                        time.sleep(0.1)

                except Exception as e:
                    logging.error(f"获取{lock_type.value}锁时发生错误: {e}")
                    return False

            def _can_acquire_lock(self, lock_type: LockType) -> bool:
                """检查是否可以获取指定类型的锁"""
                try:
                    # 获取读锁节点
                    read_locks = self.zk.get_children(self.lock_path + "read_locks/")
                    # 获取写锁节点
                    write_locks = self.zk.get_children(self.lock_path + "write_locks/")

                    if lock_type == LockType.READ:
                        # 读锁获取条件:没有写锁存在
                        if write_locks:
                            logging.debug(f"存在写锁,无法获取读锁: {write_locks}")
                            return False
                        return True

                    else:  # WRITE
                        # 写锁获取条件:没有读锁和写锁存在
                        if read_locks:
                            logging.debug(f"存在读锁,无法获取写锁: {read_locks}")
                            return False
                        if write_locks:
                            logging.debug(f"存在写锁,无法获取写锁: {write_locks}")
                            return False
                        return True

                except Exception as e:
                    logging.error(f"检查锁获取条件时发生错误: {e}")
                    return False

            def _create_lock_node(self, lock_type: LockType) -> Optional[str]:
                """创建锁节点"""
                try:
                    if lock_type == LockType.READ:
                        node_path = self.lock_path + "read_locks/"
                    else:
                        node_path = self.lock_path + "write_locks/"

                    # 创建临时顺序节点
                    lock_node_path = self.zk.create(
                        node_path + "lock-",
                        value=f"{self.identifier}:{self.thread_id}:{datetime.now().isoformat()}".encode('utf-8'),
                        ephemeral=True,
                        sequence=True
                    )

                    return lock_node_path

                except Exception as e:
                    logging.error(f"创建{lock_type.value}锁节点失败: {e}")
                    return None

            def release_read_lock(self) -> bool:
                """释放读锁"""
                return self._release_lock(LockType.READ)

            def release_write_lock(self) -> bool:
                """释放写锁"""
                return self._release_lock(LockType.WRITE)

            def _release_lock(self, lock_type: LockType) -> bool:
                """释放锁的通用方法"""
                lock_key = f"{lock_type.value}_{self.identifier}_{self.thread_id}"

                if lock_key not in self.acquired_locks:
                    logging.warning(f"尝试释放未获取的{lock_type.value}锁")
                    return False

                # 减少重入计数
                self.acquired_locks[lock_key] -= 1

                if self.acquired_locks[lock_key] > 0:
                    logging.info(f"减少{lock_type.value}锁重入计数: {self.acquired_locks[lock_key]}")
                    return True

                # 完全释放锁
                try:
                    if lock_type == LockType.READ:
                        node_path = self.lock_path + "read_locks/"
                    else:
                        node_path = self.lock_path + "write_locks/"

                    # 查找并删除当前客户端的锁节点
                    children = self.zk.get_children(node_path)
                    for child in children:
                        if child.startswith("lock-"):
                            full_path = node_path + child
                            try:
                                data, stat = self.zk.get(full_path)
                                node_data = data.decode('utf-8')
                                if node_data.startswith(f"{self.identifier}:{self.thread_id}"):
                                    self.zk.delete(full_path)
                                    logging.info(f"成功删除{lock_type.value}锁节点: {full_path}")
                                    break
                            except NoNodeError:
                                continue
                            except Exception as e:
                                logging.warning(f"读取锁节点数据失败: {e}")

                    del self.acquired_locks[lock_key]
                    logging.info(f"成功释放{lock_type.value}锁: {self.lock_path}")
                    return True

                except Exception as e:
                    logging.error(f"释放{lock_type.value}锁时发生错误: {e}")
                    return False

            def close(self):
                """关闭连接"""
                try:
                    # 释放所有持有的锁
                    for lock_key in list(self.acquired_locks.keys()):
                        lock_type = LockType.READ if lock_key.startswith("read") else LockType.WRITE
                        self._release_lock(lock_type)

                    if self.zk:
                        self.zk.stop()
                        self.zk.close()
                        self.zk = None

                    self.connected = False
                    logging.info("ZooKeeper读写锁连接已关闭")

                except Exception as e:
                    logging.error(f"关闭ZooKeeper连接时发生错误: {e}")

            def __enter__(self):
                """上下文管理器入口(默认获取写锁)"""
                self.acquire_write_lock()
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                """上下文管理器出口"""
                self.release_write_lock()

            def __del__(self):
                """析构函数"""
                self.close()

        class ZookeeperLeaseLock:
            """带租约的ZooKeeper锁"""

            def __init__(self, hosts: str, lock_path: str, lease_time: int = 60):
                """
                初始化租约锁

                Args:
                    hosts: ZooKeeper服务器地址
                    lock_path: 锁路径
                    lease_time: 租约时间(秒)
                """
                self.hosts = hosts
                self.lock_path = lock_path.rstrip('/') + '/'
                self.lease_time = lease_time
                self.identifier = str(uuid.uuid4())
                self.acquired = False

                # ZooKeeper客户端
                self.zk = None
                self.lock_node_path = None
                self.renewal_thread = None
                self.stop_renewal = threading.Event()
                self.connected = False

                # 连接ZooKeeper
                self._connect()

            def _connect(self):
                """连接到ZooKeeper"""
                try:
                    self.zk = KazooClient(
                        hosts=self.hosts,
                        timeout=10,
                        connection_retry=dict(max_delay=5, max_tries=10)
                    )
                    self.zk.start()

                    if self.zk.connected:
                        self.connected = True
                        logging.info(f"成功连接到ZooKeeper: {self.hosts}")
                        self.zk.ensure_path(self.lock_path)
                    else:
                        raise ConnectionError("无法连接到ZooKeeper")

                except Exception as e:
                    logging.error(f"连接ZooKeeper失败: {e}")
                    raise

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """获取锁"""
                if not self.connected:
                    logging.error("ZooKeeper连接未建立")
                    return False

                if self.acquired:
                    return True

                start_time = time.time()
                effective_timeout = timeout if timeout is not None else self.lease_time

                try:
                    # 创建锁节点
                    self.lock_node_path = self.zk.create(
                        self.lock_path + "lease_lock-",
                        value=f"{self.identifier}:{datetime.now().isoformat()}".encode('utf-8'),
                        ephemeral=True,
                        sequence=True
                    )

                    while True:
                        # 检查是否为最小节点
                        children = self.zk.get_children(self.lock_path)
                        lock_nodes = sorted(child for child in children if child.startswith("lease_lock-"))

                        current_node_name = self.lock_node_path.split('/')[-1]
                        is_smallest = current_node_name == lock_nodes[0]

                        if is_smallest:
                            # 获取锁成功
                            self.acquired = True
                            self._start_renewal_thread()
                            logging.info(f"成功获取租约锁: {self.lock_path}")
                            return True

                        if not blocking:
                            self._cleanup()
                            return False

                        # 检查超时
                        if (time.time() - start_time) >= effective_timeout:
                            logging.warning(f"获取租约锁超时: {self.lock_path}")
                            self._cleanup()
                            return False

                        time.sleep(0.1)

                except Exception as e:
                    logging.error(f"获取租约锁时发生错误: {e}")
                    self._cleanup()
                    return False

            def _start_renewal_thread(self):
                """启动租约续期线程"""
                def renewal_worker():
                    renewal_interval = self.lease_time / 3  # 在租约的1/3时间后续期

                    while not self.stop_renewal.wait(renewal_interval):
                        if not self._renew_lease():
                            logging.warning("租约续期失败,停止续期线程")
                            break

                self.renewal_thread = threading.Thread(target=renewal_worker, daemon=True)
                self.renewal_thread.start()

            def _renew_lease(self) -> bool:
                """续期租约"""
                if not self.lock_node_path or not self.acquired:
                    return False

                try:
                    new_data = f"{self.identifier}:{datetime.now().isoformat()}".encode('utf-8')
                    self.zk.set(self.lock_node_path, new_data)
                    logging.debug("租约续期成功")
                    return True

                except Exception as e:
                    logging.error(f"租约续期失败: {e}")
                    return False

            def release(self) -> bool:
                """释放锁"""
                if not self.acquired:
                    logging.warning("尝试释放未获取的锁")
                    return False

                try:
                    self._cleanup()
                    self.acquired = False
                    logging.info(f"成功释放租约锁: {self.lock_path}")
                    return True

                except Exception as e:
                    logging.error(f"释放租约锁时发生错误: {e}")
                    return False

            def _cleanup(self):
                """清理资源"""
                # 停止续期线程
                if self.renewal_thread:
                    self.stop_renewal.set()
                    self.renewal_thread.join(timeout=5)
                    self.renewal_thread = None
                    self.stop_renewal.clear()

                # 删除锁节点
                if self.lock_node_path:
                    try:
                        self.zk.delete(self.lock_node_path)
                    except NoNodeError:
                        pass
                    except Exception as e:
                        logging.error(f"删除锁节点失败: {e}")
                    finally:
                        self.lock_node_path = None

            def close(self):
                """关闭连接"""
                try:
                    if self.acquired:
                        self.release()

                    if self.zk:
                        self.zk.stop()
                        self.zk.close()
                        self.zk = None

                    self.connected = False
                    logging.info("ZooKeeper租约锁连接已关闭")

                except Exception as e:
                    logging.error(f"关闭ZooKeeper连接时发生错误: {e}")

            def __enter__(self):
                """上下文管理器入口"""
                self.acquire()
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                """上下文管理器出口"""
                self.release()

            def __del__(self):
                """析构函数"""
                self.close()

        # 使用示例
        def advanced_zookeeper_lock_example():
            """高级ZooKeeper锁使用示例"""
            logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s - %(levelname)s - %(message)s')

            zk_hosts = "localhost:2181"  # 替换为真实的ZooKeeper地址

            # 读写锁示例
            rw_lock = ZookeeperReadWriteLock(zk_hosts, "/distributed_locks/rw_example")

            def reader_worker(worker_id: int):
                """读锁工作线程"""
                lock = ZookeeperReadWriteLock(zk_hosts, "/distributed_locks/rw_example")
                try:
                    if lock.acquire_read_lock():
                        print(f"Reader-{worker_id}: 获取读锁,开始读取数据")
                        time.sleep(2)
                        print(f"Reader-{worker_id}: 读取完成")
                    else:
                        print(f"Reader-{worker_id}: 获取读锁失败")
                finally:
                    lock.release_read_lock()
                    lock.close()

            def writer_worker(worker_id: int):
                """写锁工作线程"""
                lock = ZookeeperReadWriteLock(zk_hosts, "/distributed_locks/rw_example")
                try:
                    if lock.acquire_write_lock():
                        print(f"Writer-{worker_id}: 获取写锁,开始写入数据")
                        time.sleep(3)
                        print(f"Writer-{worker_id}: 写入完成")
                    else:
                        print(f"Writer-{worker_id}: 获取写锁失败")
                finally:
                    lock.release_write_lock()
                    lock.close()

            # 创建读写线程
            threads = []

            # 启动几个读线程
            for i in range(3):
                thread = threading.Thread(target=reader_worker, args=(i,))
                threads.append(thread)
                thread.start()

            # 启动写线程
            for i in range(2):
                thread = threading.Thread(target=writer_worker, args=(i,))
                threads.append(thread)
                thread.start()

            # 等待所有线程完成
            for thread in threads:
                thread.join()

            # 租约锁示例
            with ZookeeperLeaseLock(zk_hosts, "/distributed_locks/lease_example", lease_time=30) as lease_lock:
                print("获取租约锁,执行长时间操作...")
                time.sleep(10)
                print("操作完成,租约锁自动续期中")

        if __name__ == "__main__":
            advanced_zookeeper_lock_example()
        ---

04.实际应用场景
    a.分布式配置管理
        a.场景描述
            在分布式环境中管理应用配置,确保配置的一致性和原子性更新。
        b.解决方案
            使用ZooKeeper锁协调配置更新操作,防止配置冲突。
        c.代码示例
            ---
            class DistributedConfigManager:
                """分布式配置管理器"""

                def __init__(self, hosts: str, config_path: str):
                    self.zk_hosts = hosts
                    self.config_path = config_path.rstrip('/') + '/'
                    self.lock = ZookeeperDistributedLock(hosts, config_path + "config_lock")

                def update_config(self, key: str, value: str) -> bool:
                    """更新配置(分布式安全)"""
                    with self.lock:
                        try:
                            # 读取当前配置
                            current_config = self._read_config()

                            # 更新配置
                            current_config[key] = value

                            # 写入新配置
                            self._write_config(current_config)

                            logging.info(f"配置更新成功: {key} = {value}")
                            return True

                        except Exception as e:
                            logging.error(f"配置更新失败: {e}")
                            return False

                def _read_config(self) -> Dict[str, str]:
                    """读取配置"""
                    # 实现配置读取逻辑
                    pass

                def _write_config(self, config: Dict[str, str]):
                    """写入配置"""
                    # 实现配置写入逻辑
                    pass
            ---
    b.分布式任务队列
        a.场景描述
            实现跨节点的任务队列,确保任务分配和执行的协调。
        b.解决方案
            使用ZooKeeper读写锁协调任务的分配和状态更新。
        c.代码示例
            ---
            class DistributedTaskQueue:
                """分布式任务队列"""

                def __init__(self, hosts: str, queue_path: str):
                    self.zk_hosts = hosts
                    self.queue_path = queue_path.rstrip('/') + '/'
                    self.lock = ZookeeperReadWriteLock(hosts, queue_path + "queue_lock")

                def enqueue_task(self, task_data: Dict[str, Any]) -> bool:
                    """入队任务"""
                    with self.lock.get_read_lock():
                        # 获取读锁检查队列状态
                        queue_size = self._get_queue_size()
                        if queue_size >= 1000:  # 队列大小限制
                            return False

                    with self.lock.get_write_lock():
                        # 获取写锁添加任务
                        return self._add_task_to_queue(task_data)

                def dequeue_task(self) -> Optional[Dict[str, Any]]:
                    """出队任务"""
                    with self.lock.get_write_lock():
                        return self._remove_task_from_queue()

                def get_queue_status(self) -> Dict[str, Any]:
                    """获取队列状态"""
                    with self.lock.get_read_lock():
                        return self._get_queue_info()
            ---
    c.分布式选举
        a.场景描述
            在集群中选举主节点,确保只有一个领导者。
        b.解决方案
            使用ZooKeeper的顺序节点和临时节点特性实现分布式选举。
        c.代码示例
            ---
            class DistributedElection:
                """分布式选举"""

                def __init__(self, hosts: str, election_path: str):
                    self.zk_hosts = hosts
                    self.election_path = election_path.rstrip('/') + '/'
                    self.zk = None
                    self.node_path = None
                    self.is_leader = False
                    self.leader_callbacks = []

                    self._connect()

                def participate(self, on_become_leader: Callable, on_lose_leadership: Callable):
                    """参与选举"""
                    self.leader_callbacks = [on_become_leader, on_lose_leadership]
                    self._create_election_node()
                    self._monitor_leadership()

                def _create_election_node(self):
                    """创建选举节点"""
                    self.node_path = self.zk.create(
                        self.election_path + "candidate-",
                        value=str(uuid.uuid4()).encode('utf-8'),
                        ephemeral=True,
                        sequence=True
                    )

                def _monitor_leadership(self):
                    """监控领导者状态"""
                    def check_leadership():
                        while True:
                            try:
                                if self._is_current_leader():
                                    if not self.is_leader:
                                        self.is_leader = True
                                        self.leader_callbacks[0]()  # 成为领导者回调
                                else:
                                    if self.is_leader:
                                        self.is_leader = False
                                        self.leader_callbacks[1]()  # 失去领导者回调

                                time.sleep(5)

                            except Exception as e:
                                logging.error(f"监控领导者状态时发生错误: {e}")

                    monitoring_thread = threading.Thread(target=check_leadership, daemon=True)
                    monitoring_thread.start()

                def _is_current_leader(self) -> bool:
                    """检查当前节点是否为领导者"""
                    candidates = self.zk.get_children(self.election_path)
                    candidates.sort()
                    return candidates[0] == self.node_path.split('/')[-1]
            ---

05.最佳实践与性能优化
    a.设计原则
        a.路径规划
            合理规划ZooKeeper的节点路径,避免路径冲突。
        b.会话管理
            正确处理ZooKeeper会话,确保连接的可靠性。
        c.异常处理
            完善的异常处理机制,确保系统的稳定性。
        d.资源清理
            及时清理临时节点,避免资源泄漏。
    b.性能优化策略
        a.批量操作
            合并多个ZooKeeper操作,减少网络开销。
        b.连接复用
            复用ZooKeeper连接,减少连接建立开销。
        c.缓存机制
            缓存ZooKeeper数据,减少重复查询。
        d.超时设置
            合理设置超时时间,平衡性能和可靠性。
    c.监控与维护
        a.连接监控
            监控ZooKeeper连接状态,及时发现连接问题。
        b.性能指标
            跟踪锁操作的延迟和成功率。
        c.告警机制
            设置异常情况的告警通知。
        d.日志记录
            详细记录操作日志,便于问题诊断。
    d.故障处理
        a.连接重试
            实现连接重试机制,处理网络故障。
        b.数据恢复
            处理ZooKeeper数据异常恢复。
        c.降级策略
            在ZooKeeper不可用时提供降级方案。
        d.高可用部署
            部署ZooKeeper集群,提高可用性。

9.4 锁的超时与续期

01.基本概念
    a.定义与作用
        锁超时机制是分布式锁的重要组成部分,通过设置锁的有效期来防止死锁和资源泄漏,续期机制则用于在长时间操作中保持锁的有效性。
    b.核心特性
        a.自动过期
            在指定时间后自动释放锁,防止进程崩溃导致的锁泄漏。
        b.动态续期
            支持在锁的有效期内延长锁的生命周期。
        c.安全性保障
            确保只有锁的持有者才能进行续期操作。
        d.性能优化
            合理设置超时时间,平衡性能和可靠性。
    c.超时与续期的关系
        a.互补机制
            超时机制保障安全性,续期机制保障可用性。
        b.动态调整
            根据业务需求动态调整超时时间。
        c.故障处理
            在异常情况下自动触发超时释放机制。

02.超时机制实现
    a.基础超时设计
        a.固定超时
            为锁设置固定的有效期,时间到后自动失效。
        b.相对超时
            从获取锁时开始计算的有效期。
        c.绝对超时
            在特定时间点失效的锁。
        d.超时检测
            定期检查锁的有效期状态。
    b.超时处理策略
        a.主动检测
            定期轮询检查锁的有效性。
        b.被动触发
            在使用时检查锁是否过期。
        c.事件通知
            锁过期时发送通知给相关组件。
        d.资源清理
            超时后自动清理相关资源。
    c.代码示例
        ---
        # 锁超时机制基本实现
        import time
        import threading
        import logging
        from datetime import datetime, timedelta
        from typing import Optional, Dict, Any, Callable
        from enum import Enum
        import uuid

        class TimeoutPolicy(Enum):
            """超时策略枚举"""
            FIXED = "fixed"           # 固定超时
            SLIDING = "sliding"       # 滑动超时
            ADAPTIVE = "adaptive"     # 自适应超时

        class LockTimeout:
            """锁超时管理器"""

            def __init__(self, timeout_seconds: int = 30, policy: TimeoutPolicy = TimeoutPolicy.FIXED):
                """
                初始化超时管理器

                Args:
                    timeout_seconds: 超时时间(秒)
                    policy: 超时策略
                """
                self.timeout_seconds = timeout_seconds
                self.policy = policy
                self.identifier = str(uuid.uuid4())
                self.acquired_at = None
                self.expires_at = None
                self.last_activity = None

                # 自适应超时相关
                self.avg_execution_time = 0
                self.execution_count = 0
                self.max_execution_time = 0

                # 监控线程
                self.monitor_thread = None
                self.stop_monitoring = threading.Event()
                self.timeout_callbacks = []

            def start_timer(self):
                """启动超时计时器"""
                self.acquired_at = datetime.now()
                self.last_activity = self.acquired_at

                if self.policy == TimeoutPolicy.FIXED:
                    self.expires_at = self.acquired_at + timedelta(seconds=self.timeout_seconds)
                elif self.policy == TimeoutPolicy.SLIDING:
                    self.expires_at = self.acquired_at + timedelta(seconds=self.timeout_seconds)
                elif self.policy == TimeoutPolicy.ADAPTIVE:
                    adaptive_timeout = self._calculate_adaptive_timeout()
                    self.expires_at = self.acquired_at + timedelta(seconds=adaptive_timeout)

                # 启动监控线程
                self._start_monitoring()

            def _calculate_adaptive_timeout(self) -> int:
                """计算自适应超时时间"""
                if self.execution_count == 0:
                    return self.timeout_seconds

                # 使用平均执行时间的1.5倍作为超时时间,但不超过最大执行时间的2倍
                base_timeout = int(self.avg_execution_time * 1.5)
                max_timeout = int(self.max_execution_time * 2)
                min_timeout = max(10, self.timeout_seconds // 2)

                return max(min_timeout, min(base_timeout, max_timeout))

            def renew_timeout(self, additional_seconds: Optional[int] = None):
                """续期超时"""
                if not self.expires_at:
                    return False

                if additional_seconds is None:
                    additional_seconds = self.timeout_seconds

                if self.policy == TimeoutPolicy.SLIDING:
                    # 滑动超时:从当前时间开始重新计算
                    self.expires_at = datetime.now() + timedelta(seconds=additional_seconds)
                    self.last_activity = datetime.now()
                    logging.info(f"滑动超时续期成功,延长 {additional_seconds} 秒")

                elif self.policy == TimeoutPolicy.ADAPTIVE:
                    # 自适应超时:重新计算
                    adaptive_timeout = self._calculate_adaptive_timeout()
                    self.expires_at = datetime.now() + timedelta(seconds=adaptive_timeout)
                    self.last_activity = datetime.now()
                    logging.info(f"自适应超时续期成功,延长 {adaptive_timeout} 秒")

                else:
                    # 固定超时:延长固定时间
                    self.expires_at = self.expires_at + timedelta(seconds=additional_seconds)
                    logging.info(f"固定超时续期成功,延长 {additional_seconds} 秒")

                return True

            def is_expired(self) -> bool:
                """检查是否已过期"""
                if not self.expires_at:
                    return False

                if self.policy == TimeoutPolicy.SLIDING and self.last_activity:
                    # 滑动超时:检查最后活动时间
                    return datetime.now() > (self.last_activity + timedelta(seconds=self.timeout_seconds))
                else:
                    # 固定和自适应超时:检查过期时间
                    return datetime.now() > self.expires_at

            def update_activity(self):
                """更新活动时间"""
                self.last_activity = datetime.now()

            def record_execution_time(self, execution_time: float):
                """记录执行时间(用于自适应超时)"""
                self.execution_count += 1
                self.max_execution_time = max(self.max_execution_time, execution_time)

                # 计算平均执行时间
                alpha = 0.1  # 平滑因子
                self.avg_execution_time = alpha * execution_time + (1 - alpha) * self.avg_execution_time

            def get_remaining_time(self) -> float:
                """获取剩余时间(秒)"""
                if not self.expires_at:
                    return self.timeout_seconds

                if self.policy == TimeoutPolicy.SLIDING and self.last_activity:
                    expires_from_activity = self.last_activity + timedelta(seconds=self.timeout_seconds)
                    return (expires_from_activity - datetime.now()).total_seconds()
                else:
                    return (self.expires_at - datetime.now()).total_seconds()

            def _start_monitoring(self):
                """启动超时监控线程"""
                if self.monitor_thread and self.monitor_thread.is_alive():
                    return

                def timeout_monitor():
                    while not self.stop_monitoring.wait(1):  # 每秒检查一次
                        if self.is_expired():
                            logging.warning(f"锁 {self.identifier} 已超时")
                            # 调用超时回调
                            for callback in self.timeout_callbacks:
                                try:
                                    callback(self.identifier)
                                except Exception as e:
                                    logging.error(f"超时回调执行失败: {e}")
                            break

                self.monitor_thread = threading.Thread(target=timeout_monitor, daemon=True)
                self.monitor_thread.start()

            def add_timeout_callback(self, callback: Callable[[str], None]):
                """添加超时回调函数"""
                self.timeout_callbacks.append(callback)

            def stop(self):
                """停止超时管理器"""
                self.stop_monitoring.set()
                if self.monitor_thread:
                    self.monitor_thread.join(timeout=2)

        # 使用示例
        class TimeoutManagedLock:
            """带超时管理的锁"""

            def __init__(self, lock_name: str, timeout: int = 30, policy: TimeoutPolicy = TimeoutPolicy.FIXED):
                self.lock_name = lock_name
                self.lock = threading.Lock()
                self.timeout_manager = LockTimeout(timeout, policy)
                self.acquired = False

                # 设置超时回调
                self.timeout_manager.add_timeout_callback(self._on_timeout)

            def _on_timeout(self, identifier: str):
                """超时回调处理"""
                if self.acquired:
                    logging.warning(f"锁 {self.lock_name} 超时,强制释放")
                    self._force_release()

            def _force_release(self):
                """强制释放锁"""
                if self.acquired:
                    self.acquired = False
                    if self.lock.locked():
                        self.lock.release()
                    self.timeout_manager.stop()

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """获取锁"""
                if not self.lock.acquire(blocking=blocking, timeout=timeout):
                    return False

                self.acquired = True
                self.timeout_manager.start_timer()
                logging.info(f"成功获取锁: {self.lock_name}")
                return True

            def release(self) -> bool:
                """释放锁"""
                if not self.acquired:
                    return False

                try:
                    self.timeout_manager.stop()
                    if self.lock.locked():
                        self.lock.release()
                    self.acquired = False
                    logging.info(f"成功释放锁: {self.lock_name}")
                    return True

                except Exception as e:
                    logging.error(f"释放锁失败: {e}")
                    return False

            def extend(self, additional_seconds: Optional[int] = None) -> bool:
                """延长锁的有效期"""
                if not self.acquired:
                    logging.warning("尝试延长未获取的锁")
                    return False

                return self.timeout_manager.renew_timeout(additional_seconds)

            def update_activity(self):
                """更新锁活动时间"""
                if self.acquired:
                    self.timeout_manager.update_activity()

            def __enter__(self):
                """上下文管理器入口"""
                self.acquire()
                return self

            def __exit__(self, exc_type, exc_val, exc_tb):
                """上下文管理器出口"""
                self.release()

        def timeout_example():
            """锁超时机制使用示例"""
            logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s - %(levelname)s - %(message)s')

            # 固定超时锁
            fixed_lock = TimeoutManagedLock("fixed_lock", timeout=5, policy=TimeoutPolicy.FIXED)

            with fixed_lock:
                print("固定超时锁,将在5秒后过期")
                time.sleep(3)
                print(f"剩余时间: {fixed_lock.timeout_manager.get_remaining_time():.2f}秒")

            # 滑动超时锁
            sliding_lock = TimeoutManagedLock("sliding_lock", timeout=3, policy=TimeoutPolicy.SLIDING)

            with sliding_lock:
                print("滑动超时锁,将在3秒无活动后过期")
                for i in range(5):
                    time.sleep(1)
                    sliding_lock.update_activity()  # 更新活动时间
                    print(f"第{i+1}秒后更新活动,剩余时间: {sliding_lock.timeout_manager.get_remaining_time():.2f}秒")

            # 自适应超时锁
            adaptive_lock = TimeoutManagedLock("adaptive_lock", timeout=10, policy=TimeoutPolicy.ADAPTIVE)

            with adaptive_lock:
                print("自适应超时锁,将根据历史执行时间调整")
                # 模拟不同的执行时间
                execution_times = [2, 4, 6, 3, 5]
                for exec_time in execution_times:
                    start_time = time.time()
                    time.sleep(0.1)  # 模拟操作
                    actual_time = time.time() - start_time
                    adaptive_lock.timeout_manager.record_execution_time(actual_time)
                    print(f"记录执行时间: {actual_time:.2f}秒")

        if __name__ == "__main__":
            timeout_example()
        ---

03.续期机制实现
    a.主动续期
        a.定时续期
            启动后台线程定期自动续期。
        b.条件续期
            在满足特定条件时进行续期。
        c.递增续期
            根据执行时间动态增加续期时间。
    b.被动续期
        a.请求续期
            由应用程序主动请求续期。
        b.手动续期
            提供API供外部手动调用续期。
        c.批量续期
            一次续期多个锁。
    c.智能续期
        a.预测续期
            根据历史数据预测执行时间。
        b.动态调整
            根据系统负载动态调整续期策略。
        c.故障检测
            检测续期过程中的异常情况。
    d.代码示例
        ---
        # 锁续期机制高级实现
        import time
        import threading
        import logging
        from datetime import datetime, timedelta
        from typing import Optional, Dict, Any, Callable, List
        from enum import Enum
        import uuid
        import heapq

        class RenewalStrategy(Enum):
            """续期策略枚举"""
            FIXED_INTERVAL = "fixed_interval"    # 固定间隔续期
            PROGRESSIVE = "progressive"          # 递进式续期
            ADAPTIVE = "adaptive"                # 自适应续期
            PREDICTIVE = "predictive"            # 预测性续期

        class LockRenewal:
            """锁续期管理器"""

            def __init__(self, lock_identifier: str, initial_lease: int = 30,
                        strategy: RenewalStrategy = RenewalStrategy.FIXED_INTERVAL):
                """
                初始化续期管理器

                Args:
                    lock_identifier: 锁标识符
                    initial_lease: 初始租约时间(秒)
                    strategy: 续期策略
                """
                self.lock_identifier = lock_identifier
                self.initial_lease = initial_lease
                self.strategy = strategy
                self.current_lease_end = None
                self.renewal_count = 0
                self.active = False

                # 续期历史记录
                self.renewal_history = []
                self.execution_time_history = []

                # 续期控制
                self.renewal_thread = None
                self.stop_renewal = threading.Event()
                self.renewal_callbacks = []

                # 预测性续期相关
                self.avg_execution_time = 0
                self.execution_variance = 0
                self.prediction_accuracy = 0

            def start_renewal(self, renew_lock_func: Callable[[str, int], bool]):
                """
                启动续期机制

                Args:
                    renew_lock_func: 续期锁的函数,参数为(lock_id, lease_time),返回是否成功
                """
                if self.active:
                    logging.warning("续期机制已激活")
                    return

                self.active = True
                self.current_lease_end = datetime.now() + timedelta(seconds=self.initial_lease)

                # 启动续期线程
                self._start_renewal_worker(renewal_func)

            def _start_renewal_worker(self, renew_lock_func):
                """启动续期工作线程"""
                def renewal_worker():
                    while not self.stop_renewal.is_set() and self.active:
                        try:
                            # 计算下次续期时间
                            next_renewal_time = self._calculate_next_renewal_time()

                            # 等待到下次续期时间或停止信号
                            if self.stop_renewal.wait(next_renewal_time):
                                break

                            # 执行续期
                            if self._perform_renewal(renew_lock_func):
                                logging.info(f"锁 {self.lock_identifier} 续期成功 (第{self.renewal_count + 1}次)")
                                # 调用续期回调
                                for callback in self.renewal_callbacks:
                                    callback(self.lock_identifier, True)
                            else:
                                logging.error(f"锁 {self.lock_identifier} 续期失败")
                                # 调用续期失败回调
                                for callback in self.renewal_callbacks:
                                    callback(self.lock_identifier, False)
                                break

                        except Exception as e:
                            logging.error(f"续期过程中发生异常: {e}")

                self.renewal_thread = threading.Thread(target=renewal_worker, daemon=True)
                self.renewal_thread.start()

            def _calculate_next_renewal_time(self) -> float:
                """计算下次续期等待时间"""
                if self.strategy == RenewalStrategy.FIXED_INTERVAL:
                    # 固定间隔:在租约的1/3时间后续期
                    return self.initial_lease / 3

                elif self.strategy == RenewalStrategy.PROGRESSIVE:
                    # 递进式:随着续期次数增加,间隔时间递增
                    base_interval = self.initial_lease / 3
                    growth_factor = 1.1 ** self.renewal_count
                    return min(base_interval * growth_factor, self.initial_lease / 2)

                elif self.strategy == RenewalStrategy.ADAPTIVE:
                    # 自适应:根据历史执行时间调整
                    if self.execution_time_history:
                        avg_exec_time = sum(self.execution_time_history[-5:]) / len(self.execution_time_history[-5:])
                        # 使用平均执行时间的1.2倍作为续期间隔
                        return max(5, avg_exec_time * 0.2)
                    else:
                        return self.initial_lease / 3

                elif self.strategy == RenewalStrategy.PREDICTIVE:
                    # 预测性:基于预测模型
                    predicted_time = self._predict_execution_time()
                    # 在预测结束时间的20%前续期
                    safety_margin = predicted_time * 0.2
                    return max(2, safety_margin)

                else:
                    return self.initial_lease / 3

            def _predict_execution_time(self) -> float:
                """预测执行时间"""
                if len(self.execution_time_history) < 3:
                    return self.initial_lease

                # 使用移动平均和方差进行预测
                recent_times = self.execution_time_history[-5:]
                predicted_time = sum(recent_times) / len(recent_times)

                # 考虑执行时间的趋势
                if len(self.execution_time_history) >= 5:
                    # 计算趋势
                    times = self.execution_time_history[-5:]
                    trend = (times[-1] - times[0]) / len(times)
                    predicted_time += trend * 2  # 预测未来趋势

                return max(1, predicted_time)

            def _perform_renewal(self, renew_lock_func) -> bool:
                """执行续期操作"""
                try:
                    # 计算新的租约时间
                    new_lease_time = self._calculate_new_lease_time()

                    # 调用续期函数
                    success = renew_lock_func(self.lock_identifier, new_lease_time)

                    if success:
                        # 更新租约信息
                        self.current_lease_end = datetime.now() + timedelta(seconds=new_lease_time)
                        self.renewal_count += 1

                        # 记录续期历史
                        self.renewal_history.append({
                            'timestamp': datetime.now(),
                            'lease_time': new_lease_time,
                            'renewal_count': self.renewal_count
                        })

                        return True
                    else:
                        return False

                except Exception as e:
                    logging.error(f"执行续期操作失败: {e}")
                    return False

            def _calculate_new_lease_time(self) -> int:
                """计算新的租约时间"""
                if self.strategy == RenewalStrategy.FIXED_INTERVAL:
                    return self.initial_lease

                elif self.strategy == RenewalStrategy.PROGRESSIVE:
                    # 递进式:随着续期次数增加,租约时间也增加
                    growth_factor = 1.05 ** self.renewal_count
                    new_lease = int(self.initial_lease * growth_factor)
                    return min(new_lease, self.initial_lease * 3)  # 最多增加3倍

                elif self.strategy == RenewalStrategy.ADAPTIVE:
                    # 自适应:基于历史执行时间
                    if self.execution_time_history:
                        avg_exec = sum(self.execution_time_history[-10:]) / len(self.execution_time_history[-10:])
                        return int(avg_exec * 1.5)  # 平均执行时间的1.5倍
                    else:
                        return self.initial_lease

                elif self.strategy == RenewalStrategy.PREDICTIVE:
                    # 预测性:基于预测时间
                    predicted = self._predict_execution_time()
                    return int(predicted * 1.8)  # 预测时间的1.8倍

                else:
                    return self.initial_lease

            def record_execution_progress(self, execution_time: float, remaining_work: float = None):
                """
                记录执行进度

                Args:
                    execution_time: 已执行时间
                    remaining_work: 剩余工作量估算(可选)
                """
                self.execution_time_history.append(execution_time)

                # 如果有剩余工作量,更新预测
                if remaining_work is not None:
                    total_estimated = execution_time + remaining_work
                    self._update_prediction(execution_time, total_estimated)

            def _update_prediction(self, current_time: float, estimated_total: float):
                """更新预测模型"""
                # 简单的线性预测更新
                self.avg_execution_time = 0.9 * self.avg_execution_time + 0.1 * estimated_total

                # 计算预测准确性
                if len(self.execution_time_history) > 1:
                    # 这里简化处理,实际可以使用更复杂的算法
                    self.prediction_accuracy = max(0, 1 - abs(current_time - estimated_total) / estimated_total)

            def manual_renewal(self, additional_lease: Optional[int] = None) -> bool:
                """手动续期"""
                if not self.active:
                    logging.warning("续期机制未激活")
                    return False

                # 临时使用内联续期函数
                def dummy_renew_func(lock_id: str, lease_time: int) -> bool:
                    # 这里应该调用实际的续期接口
                    logging.info(f"手动续期 {lock_id},租约时间: {lease_time}秒")
                    return True

                lease_time = additional_lease or self._calculate_new_lease_time()
                success = self._perform_renewal(dummy_renew_func)

                if success:
                    logging.info(f"手动续期成功,租约延长 {lease_time} 秒")
                else:
                    logging.error("手动续期失败")

                return success

            def add_renewal_callback(self, callback: Callable[[str, bool], None]):
                """添加续期回调函数"""
                self.renewal_callbacks.append(callback)

            def get_renewal_statistics(self) -> Dict[str, Any]:
                """获取续期统计信息"""
                return {
                    'renewal_count': self.renewal_count,
                    'active': self.active,
                    'current_lease_end': self.current_lease_end.isoformat() if self.current_lease_end else None,
                    'avg_execution_time': self.avg_execution_time,
                    'prediction_accuracy': self.prediction_accuracy,
                    'strategy': self.strategy.value,
                    'execution_history_count': len(self.execution_time_history)
                }

            def stop_renewal(self):
                """停止续期机制"""
                self.active = False
                self.stop_renewal.set()

                if self.renewal_thread:
                    self.renewal_thread.join(timeout=5)

                logging.info(f"锁 {self.lock_identifier} 续期机制已停止")

        class AdvancedDistributedLock:
            """高级分布式锁(集成超时和续期)"""

            def __init__(self, lock_name: str, lease_time: int = 30,
                        timeout_policy: TimeoutPolicy = TimeoutPolicy.FIXED,
                        renewal_strategy: RenewalStrategy = RenewalStrategy.ADAPTIVE):
                self.lock_name = lock_name
                self.identifier = str(uuid.uuid4())
                self.acquired = False

                # 超时管理
                self.timeout_manager = LockTimeout(lease_time, timeout_policy)

                # 续期管理
                self.renewal_manager = LockRenewal(self.identifier, lease_time, renewal_strategy)

                # 统计信息
                self.operation_start_time = None
                self.total_operation_time = 0

            def acquire(self, blocking: bool = True, timeout: Optional[float] = None) -> bool:
                """获取锁"""
                # 模拟获取分布式锁
                logging.info(f"尝试获取分布式锁: {self.lock_name}")

                # 这里应该调用实际的分布式锁获取接口
                # 为了演示,我们直接返回True
                self.acquired = True
                self.operation_start_time = time.time()

                # 启动超时管理
                self.timeout_manager.start_timer()

                # 启动续期机制
                self.renewal_manager.start_renewal(self._renew_lock)

                logging.info(f"成功获取分布式锁: {self.lock_name}")
                return True

            def _renew_lock(self, lock_id: str, lease_time: int) -> bool:
                """续期锁的实际实现"""
                # 这里应该调用实际的分布式锁续期接口
                logging.debug(f"续期锁 {lock_id},租约时间: {lease_time}秒")
                return True

            def release(self) -> bool:
                """释放锁"""
                if not self.acquired:
                    return False

                # 记录总操作时间
                if self.operation_start_time:
                    self.total_operation_time = time.time() - self.operation_start_time

                # 停止续期和超时管理
                self.renewal_manager.stop_renewal()
                self.timeout_manager.stop()

                # 更新续期预测模型
                if self.total_operation_time > 0:
                    self.renewal_manager.record_execution_progress(self.total_operation_time)

                # 这里应该调用实际的分布式锁释放接口
                self.acquired = False
                logging.info(f"成功释放分布式锁: {self.lock_name}")
                return True

            def update_progress(self, remaining_work: float = None):
                """更新执行进度"""
                if self.operation_start_time:
                    current_time = time.time() - self.operation_start_time
                    self.renewal_manager.record_execution_progress(current_time, remaining_work)
                    self.timeout_manager.update_activity()

            def get_statistics(self) -> Dict[str, Any]:
                """获取锁统计信息"""
                return {
                    'lock_name': self.lock_name,
                    'identifier': self.identifier,
                    'acquired': self.acquired,
                    'total_operation_time': self.total_operation_time,
                    'timeout_stats': {
                        'remaining_time': self.timeout_manager.get_remaining_time(),
                        'is_expired': self.timeout_manager.is_expired()
                    },
                    'renewal_stats': self.renewal_manager.get_renewal_statistics()
                }

        # 使用示例
        def advanced_renewal_example():
            """高级续期机制使用示例"""
            logging.basicConfig(level=logging.INFO,
                            format='%(asctime)s - %(levelname)s - %(message)s')

            # 创建高级分布式锁
            lock = AdvancedDistributedLock(
                "advanced_lock",
                lease_time=30,
                timeout_policy=TimeoutPolicy.ADAPTIVE,
                renewal_strategy=RenewalStrategy.PREDICTIVE
            )

            try:
                if lock.acquire():
                    print("开始长时间操作...")

                    # 模拟长时间操作的不同阶段
                    stages = [
                        (5, "初始化阶段"),
                        (10, "数据处理阶段"),
                        (15, "分析计算阶段"),
                        (8, "结果输出阶段")
                    ]

                    for duration, stage_name in stages:
                        print(f"执行{stage_name},预计需要{duration}秒...")
                        time.sleep(duration)

                        # 更新进度
                        lock.update_progress()
                        stats = lock.get_statistics()
                        print(f"续期次数: {stats['renewal_stats']['renewal_count']}")
                        print(f"剩余时间: {stats['timeout_stats']['remaining_time']:.2f}秒")

                    print("操作完成")

            finally:
                lock.release()
                final_stats = lock.get_statistics()
                print(f"最终统计: {final_stats}")

        if __name__ == "__main__":
            advanced_renewal_example()
        ---

04.实际应用场景
    a.分布式批处理
        a.场景描述
            大规模数据批处理任务需要长时间持有锁,确保数据一致性。
        b.解决方案
            使用智能续期机制,根据处理进度动态调整锁的有效期。
        c.代码示例
            ---
            class DistributedBatchProcessor:
                """分布式批处理器"""

                def __init__(self, batch_name: str, total_items: int):
                    self.batch_name = batch_name
                    self.total_items = total_items
                    self.processed_items = 0
                    self.start_time = None

                    # 创建智能锁
                    self.lock = AdvancedDistributedLock(
                        f"batch_{batch_name}",
                        lease_time=60,  # 初始1分钟
                        renewal_strategy=RenewalStrategy.ADAPTIVE
                    )

                def process_batch(self):
                    """处理批次数据"""
                    if not self.lock.acquire():
                        logging.error("无法获取批处理锁")
                        return False

                    self.start_time = time.time()
                    logging.info(f"开始处理批次: {self.batch_name}")

                    try:
                        for item_id in range(1, self.total_items + 1):
                            # 处理单个项目
                            self._process_item(item_id)
                            self.processed_items += 1

                            # 每处理10个项目更新一次进度
                            if item_id % 10 == 0:
                                self._update_progress()

                            # 模拟处理时间
                            time.sleep(0.1)

                        logging.info(f"批次处理完成: {self.batch_name}")
                        return True

                    finally:
                        self.lock.release()

                def _process_item(self, item_id: int):
                    """处理单个项目"""
                    logging.debug(f"处理项目: {item_id}")
                    # 模拟项目处理逻辑
                    time.sleep(0.1)

                def _update_progress(self):
                    """更新处理进度"""
                    elapsed_time = time.time() - self.start_time
                    items_per_second = self.processed_items / elapsed_time if elapsed_time > 0 else 1
                    remaining_items = self.total_items - self.processed_items
                    estimated_remaining_time = remaining_items / items_per_second

                    # 更新锁的进度信息
                    self.lock.update_progress(estimated_remaining_time)

                    logging.info(f"进度: {self.processed_items}/{self.total_items}, "
                               f"预计剩余时间: {estimated_remaining_time:.1f}秒")
            ---
    b.数据库事务管理
        a.场景描述
            长事务需要持有关键资源锁,防止其他事务修改数据。
        b.解决方案
            使用递进式续期策略,根据事务执行阶段调整锁的有效期。
        c.代码示例
            ---
            class TransactionLockManager:
                """事务锁管理器"""

                def __init__(self, transaction_id: str):
                    self.transaction_id = transaction_id
                    self.locks = {}  # 存储事务中的所有锁
                    self.transaction_stages = ['BEGIN', 'READ', 'WRITE', 'COMMIT']
                    self.current_stage = 0

                def begin_transaction(self):
                    """开始事务"""
                    self.current_stage = 0
                    logging.info(f"开始事务: {self.transaction_id}")

                def acquire_lock(self, resource_name: str) -> bool:
                    """获取资源锁"""
                    lock_id = f"tx_{self.transaction_id}_{resource_name}"

                    # 根据事务阶段选择不同的续期策略
                    if self.current_stage == 0:  # BEGIN阶段
                        strategy = RenewalStrategy.FIXED_INTERVAL
                        lease_time = 30
                    elif self.current_stage == 1:  # READ阶段
                        strategy = RenewalStrategy.PROGRESSIVE
                        lease_time = 60
                    elif self.current_stage == 2:  # WRITE阶段
                        strategy = RenewalStrategy.ADAPTIVE
                        lease_time = 120
                    else:  # COMMIT阶段
                        strategy = RenewalStrategy.FIXED_INTERVAL
                        lease_time = 30

                    lock = AdvancedDistributedLock(
                        lock_id,
                        lease_time=lease_time,
                        renewal_strategy=strategy
                    )

                    if lock.acquire():
                        self.locks[resource_name] = lock
                        logging.info(f"获取资源锁成功: {resource_name}")
                        return True
                    else:
                        logging.error(f"获取资源锁失败: {resource_name}")
                        return False

                def next_stage(self):
                    """进入下一阶段"""
                    if self.current_stage < len(self.transaction_stages) - 1:
                        self.current_stage += 1
                        stage_name = self.transaction_stages[self.current_stage]
                        logging.info(f"事务进入阶段: {stage_name}")

                        # 更新所有锁的续期策略
                        for resource_name, lock in self.locks.items():
                            # 重新获取锁以应用新的续期策略
                            lock.release()
                            self.acquire_lock(resource_name)

                def commit_transaction(self):
                    """提交事务"""
                    logging.info(f"提交事务: {self.transaction_id}")
                    self.release_all_locks()

                def rollback_transaction(self):
                    """回滚事务"""
                    logging.warning(f"回滚事务: {self.transaction_id}")
                    self.release_all_locks()

                def release_all_locks(self):
                    """释放所有锁"""
                    for resource_name, lock in self.locks.items():
                        lock.release()
                        logging.info(f"释放资源锁: {resource_name}")

                    self.locks.clear()
            ---
    c.微服务协调
        a.场景描述
            微服务间的协调操作需要全局锁来保证数据一致性。
        b.解决方案
            使用预测性续期机制,基于历史数据预测服务响应时间。
        c.代码示例
            ---
            class MicroserviceCoordinator:
                """微服务协调器"""

                def __init__(self, operation_id: str, services: List[str]):
                    self.operation_id = operation_id
                    self.services = services
                    self.service_history = {}  # 记录服务的历史响应时间

                    # 创建协调锁
                    self.coordination_lock = AdvancedDistributedLock(
                        f"coord_{operation_id}",
                        lease_time=60,
                        renewal_strategy=RenewalStrategy.PREDICTIVE
                    )

                def coordinate_operation(self) -> bool:
                    """协调跨服务操作"""
                    if not self.coordination_lock.acquire():
                        logging.error("无法获取协调锁")
                        return False

                    try:
                        for service_name in self.services:
                            # 调用服务
                            success = self._call_service(service_name)
                            if not success:
                                logging.error(f"服务调用失败: {service_name}")
                                return False

                            # 更新操作进度
                            remaining_services = len(self.services) - self.services.index(service_name) - 1
                            estimated_remaining = self._estimate_remaining_time(remaining_services)
                            self.coordination_lock.update_progress(estimated_remaining)

                        logging.info(f"协调操作完成: {self.operation_id}")
                        return True

                    finally:
                        self.coordination_lock.release()

                def _call_service(self, service_name: str) -> bool:
                    """调用单个服务"""
                    start_time = time.time()

                    try:
                        # 模拟服务调用
                        service_delay = self._get_service_delay(service_name)
                        time.sleep(service_delay)

                        # 记录服务响应时间
                        response_time = time.time() - start_time
                        self._record_service_response(service_name, response_time)

                        logging.info(f"服务调用成功: {service_name}, 耗时: {response_time:.2f}秒")
                        return True

                    except Exception as e:
                        logging.error(f"服务调用异常: {service_name}, 错误: {e}")
                        return False

                def _get_service_delay(self, service_name: str) -> float:
                    """获取服务响应延迟(模拟)"""
                    base_delays = {
                        'auth': 0.5,
                        'user': 1.0,
                        'order': 2.0,
                        'payment': 1.5,
                        'notification': 0.3
                    }
                    return base_delays.get(service_name, 1.0) + (0.1 * hash(service_name) % 5)

                def _record_service_response(self, service_name: str, response_time: float):
                    """记录服务响应时间"""
                    if service_name not in self.service_history:
                        self.service_history[service_name] = []

                    self.service_history[service_name].append(response_time)
                    # 只保留最近10次记录
                    self.service_history[service_name] = self.service_history[service_name][-10:]

                def _estimate_remaining_time(self, remaining_services: int) -> float:
                    """估算剩余时间"""
                    total_estimated = 0
                    for service_name in self.services[-remaining_services:]:
                        if service_name in self.service_history:
                            avg_time = sum(self.service_history[service_name]) / len(self.service_history[service_name])
                            total_estimated += avg_time
                        else:
                            total_estimated += 2.0  # 默认估计时间

                    return total_estimated
            ---

05.最佳实践与故障处理
    a.超时时间设置
        a.业务评估
            根据业务操作的实际耗时设置合理的超时时间。
        b.环境考虑
            考虑网络延迟、系统负载等环境因素。
        c.缓冲策略
            在预估时间基础上增加安全缓冲。
        d.动态调整
            根据运行时数据动态优化超时设置。
    b.续期策略选择
        a.场景匹配
            根据业务特点选择合适的续期策略。
        b.性能权衡
            平衡续期频率和系统开销。
        c.故障隔离
            续期失败不应影响主业务流程。
        d.监控告警
            设置续期失败的监控和告警。
    c.监控与调试
        a.关键指标
            监控续期成功率、超时频率、锁竞争情况等。
        b.日志记录
            详细记录超时和续期相关的事件。
        c.性能分析
            分析超时和续期对系统性能的影响。
        d.异常处理
            完善处理超时和续期过程中的异常情况。
    d.故障恢复机制
        a.自动恢复
            在续期失败时尝试自动恢复锁。
        b.降级策略
            在续期不可用时提供降级方案。
        c.数据一致性
            确保故障情况下数据的一致性。
        d.通知机制
            及时通知相关人员处理故障情况。