11.milvus

1 基础概念

1.1 向量数据库

01.向量数据库定义
    a.基本概念
        a.功能说明
            向量数据库是专门用于存储、索引和查询高维向量数据的数据库系统。它通过向量相似度计算实现语义检索，广泛应用于推荐系统、图像搜索、自然语言处理等AI场景。向量数据库能够高效处理百万到十亿级别的向量数据，支持毫秒级的相似度查询。
        b.代码示例
            ---
            # 向量数据库核心概念
            # 向量：[0.1, 0.2, 0.3, ..., 0.n] 高维数组
            # 相似度：通过距离度量（欧氏距离、余弦相似度等）计算向量间的相似程度
            # 索引：加速向量检索的数据结构（如HNSW、IVF等）
            
            import numpy as np
            
            # 示例：两个向量的余弦相似度计算
            vector1 = np.array([0.1, 0.2, 0.3])
            vector2 = np.array([0.2, 0.3, 0.4])
            
            similarity = np.dot(vector1, vector2) / (np.linalg.norm(vector1) * np.linalg.norm(vector2))
            print(f"余弦相似度: {similarity}")
            ---
    b.应用场景
        a.功能说明
            向量数据库在多个AI领域有广泛应用。在推荐系统中，通过用户和物品的向量表示实现个性化推荐。在图像搜索中，将图像编码为向量进行以图搜图。在自然语言处理中，支持语义搜索、问答系统和RAG应用。在异常检测中，通过向量距离识别异常模式。
        b.代码示例
            ---
            # 典型应用场景示例
            
            # 1. 语义搜索：将文本转换为向量进行相似度检索
            query_text = "什么是人工智能"
            query_vector = embedding_model.encode(query_text)
            results = vector_db.search(query_vector, top_k=5)
            
            # 2. 推荐系统：基于用户向量找相似用户
            user_vector = get_user_embedding(user_id)
            similar_users = vector_db.search(user_vector, top_k=10)
            
            # 3. 图像搜索：以图搜图
            image_vector = image_encoder.encode(image)
            similar_images = vector_db.search(image_vector, top_k=20)
            ---

02.向量数据库vs传统数据库
    a.数据类型差异
        a.功能说明
            传统数据库主要存储结构化数据（数字、字符串、日期等），查询基于精确匹配或范围比较。向量数据库存储高维向量（通常128-1536维），查询基于相似度计算。传统数据库使用B树、哈希索引，向量数据库使用ANN索引（如HNSW、IVF）。两者的查询语义完全不同：传统数据库是精确查询，向量数据库是近似查询。
        b.代码示例
            ---
            # 传统数据库查询（精确匹配）
            SELECT * FROM products WHERE category = 'electronics' AND price < 1000;
            
            # 向量数据库查询（相似度检索）
            from pymilvus import Collection
            
            collection = Collection("products")
            search_vector = [[0.1, 0.2, 0.3, ...]]  # 查询向量
            
            results = collection.search(
                data=search_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 10}},
                limit=10
            )
            ---
    b.性能特点
        a.功能说明
            传统数据库在精确查询和事务处理上表现优异，支持ACID特性。向量数据库在高维相似度搜索上具有优势，通过近似最近邻算法实现亚线性时间复杂度。传统数据库扩展性受限于关系模型，向量数据库天然支持水平扩展。在查询延迟上，向量数据库对百万级数据可实现毫秒级响应。
        b.代码示例
            ---
            # 性能对比示例
            
            # 传统数据库：精确查询，O(log n)复杂度
            import time
            start = time.time()
            cursor.execute("SELECT * FROM users WHERE id = 12345")
            print(f"传统数据库查询耗时: {time.time() - start}s")
            
            # 向量数据库：近似查询，O(log n)复杂度（通过索引）
            start = time.time()
            results = collection.search(
                data=[query_vector],
                anns_field="vector",
                param={"metric_type": "IP", "params": {"nprobe": 16}},
                limit=10
            )
            print(f"向量数据库查询耗时: {time.time() - start}s")
            
            # 向量数据库在百万级数据上通常能保持<10ms的查询延迟
            ---

1.2 Milvus架构

01.系统架构
    a.云原生设计
        a.功能说明
            Milvus采用云原生架构，将存储和计算分离，支持弹性扩展。系统分为四个层次：接入层（负载均衡和请求路由）、协调层（元数据管理和任务调度）、执行层（数据处理和查询执行）、存储层（对象存储和消息队列）。这种架构使得各组件可以独立扩展，提高系统的可用性和可维护性。
        b.代码示例
            ---
            # Milvus架构组件
            
            # 1. 接入层（Access Layer）
            # - Proxy：接收客户端请求，进行负载均衡
            # - 提供gRPC和RESTful API
            
            # 2. 协调层（Coordinator Service）
            # - Root Coordinator：管理DDL操作（创建/删除collection）
            # - Data Coordinator：管理数据段和binlog
            # - Query Coordinator：管理查询节点和负载均衡
            # - Index Coordinator：管理索引构建任务
            
            # 3. 执行层（Worker Nodes）
            # - Query Node：执行向量搜索
            # - Data Node：数据持久化
            # - Index Node：构建向量索引
            
            # 4. 存储层（Storage）
            # - 对象存储（MinIO/S3）：存储向量数据和索引
            # - 元数据存储（etcd）：存储集合schema和元信息
            # - 消息队列（Pulsar/Kafka）：数据流和日志复制
            ---
    b.分布式特性
        a.功能说明
            Milvus支持分布式部署，通过数据分片和副本机制实现高可用。数据按segment切分，每个segment包含固定数量的向量。查询时，多个Query Node并行处理不同的segment，最后合并结果。系统支持动态扩缩容，新增节点可自动接管部分负载。通过副本机制保证数据可靠性，支持跨可用区部署。
        b.代码示例
            ---
            from pymilvus import connections, Collection, utility
            
            # 连接Milvus集群
            connections.connect(
                alias="default",
                host="milvus-cluster.example.com",
                port="19530"
            )
            
            # 查看集群状态
            print(f"Milvus版本: {utility.get_server_version()}")
            
            # 创建collection时指定分片数量
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            
            collection = Collection(
                name="distributed_collection",
                schema=schema,
                shards_num=4  # 指定4个分片，提高并行度
            )
            
            # 设置副本数量
            collection.set_properties(properties={"collection.replica.number": 2})
            ---

02.核心组件
    a.Proxy代理层
        a.功能说明
            Proxy是Milvus的接入层，负责接收客户端请求并路由到后端服务。它提供统一的API接口，支持gRPC和RESTful协议。Proxy执行请求验证、参数检查和结果聚合。在集群模式下，多个Proxy实例通过负载均衡器分发请求，保证高可用性。Proxy是无状态服务，可以水平扩展。
        b.代码示例
            ---
            # Proxy配置示例（milvus.yaml）
            
            proxy:
              port: 19530
              grpc:
                serverMaxRecvSize: 536870912  # 512MB
                serverMaxSendSize: 536870912
                clientMaxRecvSize: 104857600  # 100MB
                clientMaxSendSize: 104857600
              http:
                enabled: true
                port: 9091
              timeTickInterval: 200  # ms
              msgStream:
                timeTick:
                  bufSize: 512
              maxTaskNum: 1024  # 最大并发任务数
            
            # 客户端通过Proxy连接
            from pymilvus import connections
            
            connections.connect(
                alias="default",
                host="proxy.milvus.svc.cluster.local",
                port="19530",
                user="username",
                password="password"
            )
            ---
    b.Coordinator协调器
        a.功能说明
            Coordinator负责元数据管理和任务调度。Root Coordinator管理collection和partition的创建删除，维护全局时间戳。Data Coordinator管理数据段的分配和合并，协调数据持久化。Query Coordinator管理查询节点的负载均衡，分配segment到不同节点。Index Coordinator调度索引构建任务，监控索引状态。各Coordinator通过etcd实现高可用。
        b.代码示例
            ---
            # Coordinator工作流程示例
            
            # 1. Root Coordinator：创建collection
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            
            schema = CollectionSchema([
                FieldSchema("id", DataType.INT64, is_primary=True),
                FieldSchema("vector", DataType.FLOAT_VECTOR, dim=128)
            ])
            
            # Root Coordinator处理DDL请求
            collection = Collection("example", schema=schema)
            
            # 2. Data Coordinator：插入数据
            data = [
                [i for i in range(1000)],
                [[np.random.random() for _ in range(128)] for _ in range(1000)]
            ]
            collection.insert(data)  # Data Coordinator分配segment
            
            # 3. Index Coordinator：构建索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            collection.create_index("vector", index_params)  # Index Coordinator调度构建任务
            
            # 4. Query Coordinator：执行查询
            collection.load()  # Query Coordinator分配segment到Query Node
            results = collection.search([[0.1]*128], "vector", {"nprobe": 10}, limit=10)
            ---
    c.Worker节点
        a.功能说明
            Worker节点执行实际的数据处理任务。Query Node加载索引并执行向量搜索，支持多个segment并行查询。Data Node负责数据持久化，将binlog写入对象存储。Index Node构建向量索引，支持多种索引类型。Worker节点是有状态服务，通过Coordinator进行任务分配和负载均衡。节点故障时，Coordinator会将任务重新分配到其他节点。
        b.代码示例
            ---
            # Worker节点配置示例
            
            # Query Node配置
            queryNode:
              cacheSize: 32  # GB，缓存大小
              gracefulStopTimeout: 30  # 优雅停机超时
              stats:
                publishInterval: 1000  # 统计信息发布间隔（ms）
              dataSync:
                flowGraph:
                  maxQueueLength: 1024
                  maxParallelism: 1024
              segcore:
                chunkRows: 1024  # segment chunk大小
            
            # Data Node配置
            dataNode:
              dataSync:
                flowGraph:
                  maxQueueLength: 1024
              flush:
                insertBufSize: 16777216  # 16MB
            
            # Index Node配置
            indexNode:
              scheduler:
                buildParallel: 1  # 并行构建索引数量
            
            # 监控Worker节点状态
            from pymilvus import utility
            
            # 查看Query Node信息
            query_nodes = utility.get_query_segment_info("collection_name")
            for node in query_nodes:
                print(f"Node ID: {node.nodeID}, Segment: {node.segmentID}, State: {node.state}")
            ---

1.3 核心特性

01.高性能搜索
    a.毫秒级响应
        a.功能说明
            Milvus通过优化的索引算法和内存管理实现毫秒级查询响应。在百万级向量数据上，使用HNSW索引可实现1-5ms的查询延迟。系统支持GPU加速，进一步提升搜索性能。通过预加载索引到内存，避免磁盘IO开销。支持批量查询，提高吞吐量。
        b.代码示例
            ---
            import time
            from pymilvus import Collection
            
            collection = Collection("benchmark")
            collection.load()  # 预加载索引到内存
            
            # 单次查询性能测试
            query_vector = [[0.1] * 128]
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"ef": 64}},
                limit=10
            )
            latency = (time.time() - start) * 1000
            print(f"查询延迟: {latency:.2f}ms")
            
            # 批量查询提高吞吐量
            batch_vectors = [[0.1] * 128 for _ in range(100)]
            
            start = time.time()
            results = collection.search(
                data=batch_vectors,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"ef": 64}},
                limit=10
            )
            total_time = time.time() - start
            qps = len(batch_vectors) / total_time
            print(f"批量查询QPS: {qps:.2f}")
            ---
    b.海量数据支持
        a.功能说明
            Milvus支持十亿级向量数据存储和检索。通过分布式架构，数据分散存储在多个节点上。采用segment机制，将数据切分为固定大小的块，便于管理和查询。支持增量索引构建，新数据可快速加入索引。通过数据压缩和量化技术，降低存储成本。支持冷热数据分离，热数据保存在内存，冷数据存储在对象存储。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            # 查看collection统计信息
            collection = Collection("large_scale")
            stats = collection.num_entities
            print(f"向量总数: {stats:,}")
            
            # 大规模数据插入
            batch_size = 10000
            total_vectors = 10000000  # 1000万向量
            
            for i in range(0, total_vectors, batch_size):
                data = [
                    list(range(i, i + batch_size)),
                    [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                ]
                collection.insert(data)
                
                if (i + batch_size) % 100000 == 0:
                    collection.flush()  # 定期刷新到磁盘
                    print(f"已插入 {i + batch_size:,} 条数据")
            
            # 创建索引支持大规模检索
            index_params = {
                "index_type": "IVF_PQ",  # 使用PQ量化降低内存占用
                "metric_type": "L2",
                "params": {
                    "nlist": 2048,  # 增加聚类中心数量
                    "m": 8,  # PQ子向量数量
                    "nbits": 8
                }
            }
            collection.create_index("embedding", index_params)
            ---

02.灵活扩展
    a.水平扩展
        a.功能说明
            Milvus支持无缝的水平扩展，可以动态增加Query Node、Data Node和Index Node。新增节点会自动加入集群，Coordinator会重新分配负载。通过增加Query Node提升查询吞吐量，增加Data Node提高写入性能，增加Index Node加速索引构建。扩展过程不影响在线服务，支持滚动升级。
        b.代码示例
            ---
            # Kubernetes环境下的水平扩展
            
            # 1. 扩展Query Node（提升查询性能）
            # kubectl scale deployment milvus-querynode --replicas=5
            
            # 2. 扩展Data Node（提升写入性能）
            # kubectl scale deployment milvus-datanode --replicas=3
            
            # 3. 扩展Index Node（加速索引构建）
            # kubectl scale deployment milvus-indexnode --replicas=2
            
            # 在应用层监控扩展效果
            from pymilvus import connections, utility
            
            connections.connect("default", host="milvus-proxy", port="19530")
            
            # 查看集群节点信息
            import requests
            response = requests.get("http://milvus-proxy:9091/api/v1/health")
            print(f"集群状态: {response.json()}")
            
            # 测试扩展后的性能
            collection = Collection("test")
            collection.load(replica_number=2)  # 使用2个副本提高查询并发
            
            # 并发查询测试
            import concurrent.futures
            
            def search_task(query_id):
                results = collection.search(
                    data=[[0.1] * 128],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10
                )
                return query_id
            
            with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
                futures = [executor.submit(search_task, i) for i in range(1000)]
                results = [f.result() for f in futures]
            print(f"并发查询完成: {len(results)}个请求")
            ---
    b.存储计算分离
        a.功能说明
            Milvus采用存储计算分离架构，向量数据和索引存储在对象存储（MinIO或S3）中，计算节点无状态。这种设计使得存储和计算可以独立扩展，降低成本。计算节点可以按需启动和销毁，支持弹性伸缩。存储层支持多副本和跨区域复制，保证数据可靠性。元数据存储在etcd中，支持高可用。
        b.代码示例
            ---
            # 存储计算分离配置示例（milvus.yaml）
            
            # 对象存储配置（MinIO）
            minio:
              address: minio.example.com
              port: 9000
              accessKeyID: minioadmin
              secretAccessKey: minioadmin
              useSSL: false
              bucketName: milvus-bucket
              rootPath: file  # 数据根路径
              useIAM: false
              iamEndpoint: ""
            
            # 或使用AWS S3
            # minio:
            #   address: s3.amazonaws.com
            #   port: 443
            #   accessKeyID: YOUR_ACCESS_KEY
            #   secretAccessKey: YOUR_SECRET_KEY
            #   useSSL: true
            #   bucketName: milvus-data
            #   rootPath: milvus
            #   useIAM: true
            #   iamEndpoint: ""
            #   region: us-west-2
            
            # 元数据存储配置（etcd）
            etcd:
              endpoints:
                - etcd-0.etcd:2379
                - etcd-1.etcd:2379
                - etcd-2.etcd:2379
              rootPath: by-dev  # 元数据根路径
              metaSubPath: meta
              kvSubPath: kv
            
            # 消息队列配置（Pulsar）
            pulsar:
              address: pulsar://pulsar-proxy:6650
              maxMessageSize: 5242880  # 5MB
            
            # 这种架构的优势
            # 1. 计算节点无状态，可快速扩缩容
            # 2. 存储层独立扩展，支持PB级数据
            # 3. 数据持久化在对象存储，成本低
            # 4. 支持多个集群共享存储
            ---

03.多语言支持
    a.SDK生态
        a.功能说明
            Milvus提供多语言SDK，包括Python、Java、Go、Node.js、C++等。所有SDK基于统一的gRPC接口，功能一致。Python SDK最为成熟，提供完整的API和丰富的示例。Java SDK适合企业级应用，性能优异。Go SDK轻量高效，适合微服务架构。Node.js SDK支持前端和后端开发。各SDK支持连接池、重试机制和负载均衡。
        b.代码示例
            ---
            # Python SDK
            from pymilvus import connections, Collection
            
            connections.connect("default", host="localhost", port="19530")
            collection = Collection("example")
            results = collection.search([[0.1]*128], "vector", {"nprobe": 10}, limit=10)
            
            # Java SDK
            // import io.milvus.client.*;
            // 
            // MilvusServiceClient client = new MilvusServiceClient(
            //     ConnectParam.newBuilder()
            //         .withHost("localhost")
            //         .withPort(19530)
            //         .build()
            // );
            // 
            // SearchParam searchParam = SearchParam.newBuilder()
            //     .withCollectionName("example")
            //     .withVectorFieldName("vector")
            //     .withVectors(Arrays.asList(Arrays.asList(0.1f, 0.2f, ...)))
            //     .withTopK(10)
            //     .build();
            // R<SearchResults> response = client.search(searchParam);
            
            # Go SDK
            // import "github.com/milvus-io/milvus-sdk-go/v2/client"
            // 
            // c, _ := client.NewGrpcClient(context.Background(), "localhost:19530")
            // searchResult, _ := c.Search(
            //     context.Background(),
            //     "example",
            //     []string{},
            //     "",
            //     []string{"id"},
            //     []entity.Vector{entity.FloatVector{0.1, 0.2, ...}},
            //     "vector",
            //     entity.L2,
            //     10,
            //     sp,
            // )
            
            # Node.js SDK
            // const { MilvusClient } = require("@zilliz/milvus2-sdk-node");
            // 
            // const client = new MilvusClient("localhost:19530");
            // const results = await client.search({
            //     collection_name: "example",
            //     vectors: [[0.1, 0.2, ...]],
            //     search_params: { nprobe: 10 },
            //     limit: 10
            // });
            ---
    b.RESTful API
        a.功能说明
            Milvus提供RESTful API，方便跨语言调用和快速集成。API基于HTTP协议，支持JSON格式的请求和响应。覆盖所有核心功能，包括collection管理、数据操作、搜索查询等。适合轻量级客户端和Web应用。支持API认证和访问控制。提供Swagger文档，便于测试和调试。
        b.代码示例
            ---
            import requests
            import json
            
            base_url = "http://localhost:9091/api/v1"
            
            # 1. 创建collection
            create_payload = {
                "collection_name": "rest_example",
                "schema": {
                    "fields": [
                        {"name": "id", "dtype": "Int64", "is_primary": True},
                        {"name": "vector", "dtype": "FloatVector", "params": {"dim": 128}}
                    ]
                }
            }
            response = requests.post(f"{base_url}/collection", json=create_payload)
            print(f"创建collection: {response.json()}")
            
            # 2. 插入数据
            insert_payload = {
                "collection_name": "rest_example",
                "fields_data": [
                    {"field_name": "id", "type": "Int64", "field": [1, 2, 3]},
                    {"field_name": "vector", "type": "FloatVector", "field": [[0.1]*128, [0.2]*128, [0.3]*128]}
                ]
            }
            response = requests.post(f"{base_url}/entities", json=insert_payload)
            print(f"插入数据: {response.json()}")
            
            # 3. 搜索
            search_payload = {
                "collection_name": "rest_example",
                "vectors": [[0.15] * 128],
                "dsl_type": "Dsl",
                "params": {"nprobe": 10},
                "limit": 5
            }
            response = requests.post(f"{base_url}/search", json=search_payload)
            print(f"搜索结果: {response.json()}")
            
            # 4. 查询collection信息
            response = requests.get(f"{base_url}/collection/info?collection_name=rest_example")
            print(f"Collection信息: {response.json()}")
            ---

2 快速开始

2.1 安装部署

01.Docker部署
    a.单机版安装
        a.功能说明
            使用Docker Compose可以快速部署Milvus单机版，适合开发和测试环境。单机版将所有组件运行在一个容器中，资源占用小，部署简单。支持数据持久化，重启后数据不丢失。默认端口19530用于gRPC连接，9091用于HTTP API。单机版性能受限于单台服务器资源，不支持高可用。
        b.代码示例
            ---
            # 1. 下载docker-compose.yml
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
            
            # 2. 启动Milvus
            docker-compose up -d
            
            # 3. 检查容器状态
            docker-compose ps
            
            # 输出示例：
            # NAME                COMMAND                  SERVICE             STATUS              PORTS
            # milvus-standalone   "/tini -- milvus run…"   standalone          running             0.0.0.0:9091->9091/tcp, 0.0.0.0:19530->19530/tcp
            # milvus-minio        "/usr/bin/docker-ent…"   minio               running             9000/tcp
            # milvus-etcd         "etcd -advertise-cli…"   etcd                running             2379-2380/tcp
            
            # 4. 查看日志
            docker-compose logs -f standalone
            
            # 5. 停止服务
            docker-compose down
            
            # 6. 数据持久化配置（docker-compose.yml）
            # volumes:
            #   - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
            ---
    b.集群版安装
        a.功能说明
            集群版通过Docker Compose部署多个组件，包括Proxy、Coordinator、Worker节点等。支持水平扩展和高可用，适合生产环境。各组件独立运行，可以单独扩展和升级。需要配置外部存储（MinIO/S3）和消息队列（Pulsar/Kafka）。集群版资源需求较高，建议至少3台服务器。
        b.代码示例
            ---
            # 1. 下载集群版配置
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-cluster-docker-compose.yml -O docker-compose.yml
            
            # 2. 修改配置文件（可选）
            # 编辑docker-compose.yml，调整资源限制和副本数量
            
            # 3. 启动集群
            docker-compose up -d
            
            # 4. 检查所有组件状态
            docker-compose ps
            
            # 输出示例：
            # NAME                    SERVICE             STATUS
            # milvus-rootcoord        rootcoord           running
            # milvus-datacoord        datacoord           running
            # milvus-querycoord       querycoord          running
            # milvus-indexcoord       indexcoord          running
            # milvus-proxy            proxy               running
            # milvus-querynode        querynode           running
            # milvus-datanode         datanode            running
            # milvus-indexnode        indexnode           running
            # milvus-minio            minio               running
            # milvus-etcd             etcd                running
            # milvus-pulsar           pulsar              running
            
            # 5. 扩展Query Node（提升查询性能）
            docker-compose up -d --scale querynode=3
            
            # 6. 健康检查
            curl http://localhost:9091/healthz
            ---

02.Kubernetes部署
    a.Helm安装
        a.功能说明
            使用Helm Chart可以在Kubernetes集群中快速部署Milvus。Helm提供参数化配置，支持自定义资源限制、副本数量、存储类型等。支持滚动更新和回滚，保证服务稳定性。可以集成Kubernetes生态工具，如Prometheus监控、Grafana可视化等。适合大规模生产环境，支持自动扩缩容。
        b.代码示例
            ---
            # 1. 添加Milvus Helm仓库
            helm repo add milvus https://milvus-io.github.io/milvus-helm/
            helm repo update
            
            # 2. 创建命名空间
            kubectl create namespace milvus
            
            # 3. 安装Milvus（使用默认配置）
            helm install milvus milvus/milvus --namespace milvus
            
            # 4. 自定义安装（创建values.yaml）
            cat > values.yaml <<EOF
            cluster:
              enabled: true
            
            image:
              all:
                repository: milvusdb/milvus
                tag: v2.3.0
            
            proxy:
              replicas: 2
            
            queryNode:
              replicas: 3
              resources:
                limits:
                  cpu: 4
                  memory: 8Gi
            
            dataNode:
              replicas: 2
            
            indexNode:
              replicas: 1
            
            minio:
              enabled: true
              mode: standalone
            
            pulsar:
              enabled: true
            
            etcd:
              replicaCount: 3
            EOF
            
            # 5. 使用自定义配置安装
            helm install milvus milvus/milvus -f values.yaml --namespace milvus
            
            # 6. 查看部署状态
            kubectl get pods -n milvus
            
            # 7. 暴露服务（使用LoadBalancer）
            kubectl expose deployment milvus-proxy --type=LoadBalancer --name=milvus-service --port=19530 -n milvus
            
            # 8. 获取外部IP
            kubectl get svc milvus-service -n milvus
            
            # 9. 升级Milvus
            helm upgrade milvus milvus/milvus -f values.yaml --namespace milvus
            
            # 10. 卸载
            helm uninstall milvus --namespace milvus
            ---
    b.Operator部署
        a.功能说明
            Milvus Operator是Kubernetes原生的部署方式，通过CRD定义Milvus集群。Operator自动管理集群生命周期，包括部署、升级、扩缩容、故障恢复等。支持声明式配置，只需定义期望状态，Operator自动调谐。提供更细粒度的控制，可以单独配置每个组件。适合需要深度定制和自动化运维的场景。
        b.代码示例
            ---
            # 1. 安装Milvus Operator
            kubectl apply -f https://raw.githubusercontent.com/milvus-io/milvus-operator/main/deploy/manifests/deployment.yaml
            
            # 2. 验证Operator安装
            kubectl get pods -n milvus-operator
            
            # 3. 创建Milvus集群（milvus-cluster.yaml）
            cat > milvus-cluster.yaml <<EOF
            apiVersion: milvus.io/v1beta1
            kind: Milvus
            metadata:
              name: my-milvus
              namespace: default
            spec:
              mode: cluster
              dependencies:
                etcd:
                  inCluster:
                    deletionPolicy: Delete
                    pvcDeletion: true
                storage:
                  inCluster:
                    deletionPolicy: Delete
                    pvcDeletion: true
                pulsar:
                  inCluster:
                    deletionPolicy: Delete
                    pvcDeletion: true
              components:
                proxy:
                  replicas: 2
                  resources:
                    limits:
                      cpu: 2
                      memory: 4Gi
                queryNode:
                  replicas: 3
                  resources:
                    limits:
                      cpu: 4
                      memory: 8Gi
                dataNode:
                  replicas: 2
                indexNode:
                  replicas: 1
              config:
                minio:
                  bucketName: milvus-bucket
            EOF
            
            # 4. 部署集群
            kubectl apply -f milvus-cluster.yaml
            
            # 5. 查看集群状态
            kubectl get milvus my-milvus -o yaml
            
            # 6. 扩展Query Node
            kubectl patch milvus my-milvus --type='json' -p='[{"op": "replace", "path": "/spec/components/queryNode/replicas", "value": 5}]'
            
            # 7. 查看所有资源
            kubectl get all -l app.kubernetes.io/instance=my-milvus
            
            # 8. 删除集群
            kubectl delete milvus my-milvus
            ---

03.本地开发
    a.Python环境
        a.功能说明
            使用Milvus Lite可以在本地Python环境中快速启动Milvus，无需Docker或Kubernetes。Milvus Lite是轻量级版本，适合开发、测试和原型验证。支持大部分核心功能，与完整版API兼容。数据存储在本地文件系统，便于调试。资源占用小，可以在笔记本电脑上运行。
        b.代码示例
            ---
            # 1. 安装Milvus Lite
            pip install milvus
            
            # 2. 启动Milvus Lite
            from milvus import default_server
            
            # 启动本地服务器
            default_server.start()
            
            # 3. 连接并使用
            from pymilvus import connections, Collection, CollectionSchema, FieldSchema, DataType
            
            connections.connect(
                alias="default",
                host='127.0.0.1',
                port=default_server.listen_port
            )
            
            # 4. 创建collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection(name="dev_test", schema=schema)
            
            # 5. 插入数据
            import numpy as np
            data = [
                [[np.random.random() for _ in range(128)] for _ in range(100)]
            ]
            collection.insert(data)
            
            # 6. 创建索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            collection.create_index("embedding", index_params)
            
            # 7. 查询
            collection.load()
            results = collection.search(
                data=[[np.random.random() for _ in range(128)]],
                anns_field="embedding",
                param={"nprobe": 10},
                limit=5
            )
            
            # 8. 停止服务器
            default_server.stop()
            
            # 9. 清理数据
            default_server.cleanup()
            ---
    b.开发工具
        a.功能说明
            Milvus提供多种开发工具提升开发效率。Attu是官方GUI工具，提供可视化的collection管理、数据浏览和查询功能。Milvus CLI是命令行工具，支持交互式操作和脚本自动化。Birdwatcher是调试工具，可以查看内部状态和元数据。这些工具帮助开发者快速理解和调试Milvus。
        b.代码示例
            ---
            # 1. 安装Attu（Web GUI）
            docker run -p 8000:3000 -e MILVUS_URL=localhost:19530 zilliz/attu:latest
            
            # 访问 http://localhost:8000
            # 功能：
            # - 可视化collection管理
            # - 数据浏览和编辑
            # - 向量搜索测试
            # - 索引管理
            # - 系统监控
            
            # 2. 安装Milvus CLI
            pip install milvus-cli
            
            # 启动CLI
            milvus_cli
            
            # CLI命令示例：
            # connect -h localhost -p 19530
            # list collections
            # describe collection -c my_collection
            # show index -c my_collection
            # query -c my_collection -f "id > 100" -o id,vector
            # search -c my_collection -v "[0.1, 0.2, ...]" -l 10
            
            # 3. 使用Birdwatcher（调试工具）
            # docker run -it --rm --network host milvusdb/birdwatcher:latest
            
            # Birdwatcher命令：
            # connect --etcd localhost:2379
            # show collections
            # show segments
            # show segment-info --segment-id 12345
            # show channel-watch
            
            # 4. Python调试技巧
            from pymilvus import connections, utility
            
            connections.connect("default", host="localhost", port="19530")
            
            # 查看所有collection
            collections = utility.list_collections()
            print(f"Collections: {collections}")
            
            # 查看collection详情
            from pymilvus import Collection
            collection = Collection("my_collection")
            print(f"Schema: {collection.schema}")
            print(f"Entities: {collection.num_entities}")
            print(f"Indexes: {collection.indexes}")
            
            # 查看segment信息
            segments = utility.get_query_segment_info("my_collection")
            for seg in segments:
                print(f"Segment {seg.segmentID}: {seg.num_rows} rows, state={seg.state}")
            
            # 启用日志调试
            import logging
            logging.basicConfig(level=logging.DEBUG)
            logger = logging.getLogger("pymilvus")
            logger.setLevel(logging.DEBUG)
            ---

2.2 连接数据库

01.连接配置
    a.基本连接
        a.功能说明
            使用PyMilvus SDK连接Milvus服务器需要指定主机地址和端口。默认端口为19530（gRPC）。连接建立后会创建一个全局连接对象，后续操作都基于此连接。支持多个连接别名，可以同时连接多个Milvus实例。连接对象是线程安全的，可以在多线程环境中使用。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 基本连接
            connections.connect(
                alias="default",  # 连接别名
                host="localhost",
                port="19530"
            )
            
            # 验证连接
            from pymilvus import utility
            print(f"服务器版本: {utility.get_server_version()}")
            
            # 多连接示例
            connections.connect(
                alias="cluster1",
                host="milvus-cluster1.example.com",
                port="19530"
            )
            
            connections.connect(
                alias="cluster2",
                host="milvus-cluster2.example.com",
                port="19530"
            )
            
            # 使用指定连接
            from pymilvus import Collection
            collection1 = Collection("test", using="cluster1")
            collection2 = Collection("test", using="cluster2")
            ---
    b.认证连接
        a.功能说明
            Milvus支持用户名密码认证，保护数据安全。启用认证后，所有连接都需要提供有效的凭证。支持创建多个用户并分配不同的权限。认证信息在连接建立时验证，后续操作会自动携带认证令牌。建议在生产环境中启用认证功能。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 使用用户名密码连接
            connections.connect(
                alias="default",
                host="localhost",
                port="19530",
                user="username",
                password="password"
            )
            
            # 创建新用户（需要root权限）
            from pymilvus import utility
            
            utility.create_user(
                user="new_user",
                password="secure_password",
                using="default"
            )
            
            # 修改密码
            utility.reset_password(
                user="new_user",
                old_password="secure_password",
                new_password="new_secure_password",
                using="default"
            )
            
            # 列出所有用户
            users = utility.list_usernames(using="default")
            print(f"用户列表: {users}")
            
            # 删除用户
            utility.delete_user(user="new_user", using="default")
            ---

02.连接池管理
    a.连接池配置
        a.功能说明
            PyMilvus内部使用连接池管理gRPC连接，提高并发性能。连接池会自动管理连接的创建、复用和销毁。可以配置连接池大小、超时时间等参数。连接池支持自动重连机制，网络故障恢复后会自动重建连接。合理配置连接池可以显著提升高并发场景下的性能。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 配置连接池参数
            connections.connect(
                alias="default",
                host="localhost",
                port="19530",
                pool_size=10,  # 连接池大小
                timeout=30,  # 连接超时（秒）
                wait_for_ready=True,  # 等待服务就绪
                _secure=False,  # 是否使用TLS
                _server_pem_path=None,  # TLS证书路径
                _server_name=None  # TLS服务器名称
            )
            
            # 查看连接信息
            connections.list_connections()
            
            # 获取连接详情
            conn_info = connections.get_connection_addr("default")
            print(f"连接信息: {conn_info}")
            
            # 并发测试连接池
            import concurrent.futures
            from pymilvus import Collection
            
            def query_task(task_id):
                collection = Collection("test")
                results = collection.query(
                    expr="id > 0",
                    limit=10,
                    output_fields=["id"]
                )
                return len(results)
            
            # 100个并发查询
            with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
                futures = [executor.submit(query_task, i) for i in range(100)]
                results = [f.result() for f in futures]
            print(f"完成 {len(results)} 个并发查询")
            ---
    b.连接管理
        a.功能说明
            连接对象支持显式断开和重连操作。断开连接会释放服务器端资源，但不会影响已加载的collection。应用退出前应该主动断开连接。支持检查连接状态，判断连接是否有效。可以通过别名管理多个连接，在不同连接间切换。
        b.代码示例
            ---
            from pymilvus import connections, utility
            
            # 检查连接状态
            has_connection = connections.has_connection("default")
            print(f"连接存在: {has_connection}")
            
            # 断开连接
            connections.disconnect("default")
            
            # 重新连接
            connections.connect(
                alias="default",
                host="localhost",
                port="19530"
            )
            
            # 断开所有连接
            for alias in connections.list_connections():
                connections.disconnect(alias[0])
            
            # 连接健康检查
            try:
                version = utility.get_server_version()
                print(f"连接正常，服务器版本: {version}")
            except Exception as e:
                print(f"连接异常: {e}")
                # 尝试重连
                connections.disconnect("default")
                connections.connect(
                    alias="default",
                    host="localhost",
                    port="19530"
                )
            
            # 上下文管理器（自动断开）
            class MilvusConnection:
                def __init__(self, alias, host, port):
                    self.alias = alias
                    self.host = host
                    self.port = port
                
                def __enter__(self):
                    connections.connect(
                        alias=self.alias,
                        host=self.host,
                        port=self.port
                    )
                    return self
                
                def __exit__(self, exc_type, exc_val, exc_tb):
                    connections.disconnect(self.alias)
            
            # 使用上下文管理器
            with MilvusConnection("temp", "localhost", "19530"):
                print(f"版本: {utility.get_server_version()}")
            # 自动断开连接
            ---

03.高级配置
    a.TLS加密
        a.功能说明
            Milvus支持TLS加密传输，保护数据在网络传输过程中的安全。需要配置服务器端证书和客户端证书。启用TLS后，所有通信都会加密，防止中间人攻击。适合在公网环境或对安全要求高的场景使用。TLS会增加一定的性能开销，但提供了更高的安全性。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 使用TLS连接
            connections.connect(
                alias="secure",
                host="milvus.example.com",
                port="19530",
                secure=True,  # 启用TLS
                server_pem_path="/path/to/server.pem",  # 服务器证书
                server_name="milvus.example.com",  # 服务器名称（用于证书验证）
                user="username",
                password="password"
            )
            
            # 双向TLS认证（客户端证书）
            connections.connect(
                alias="mutual_tls",
                host="milvus.example.com",
                port="19530",
                secure=True,
                server_pem_path="/path/to/server.pem",
                client_pem_path="/path/to/client.pem",  # 客户端证书
                client_key_path="/path/to/client.key",  # 客户端私钥
                ca_pem_path="/path/to/ca.pem",  # CA证书
                server_name="milvus.example.com"
            )
            
            # 服务器端TLS配置（milvus.yaml）
            # tls:
            #   serverPemPath: /path/to/server.pem
            #   serverKeyPath: /path/to/server.key
            #   caPemPath: /path/to/ca.pem
            
            # 生成自签名证书（测试用）
            # openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.pem -days 365 -nodes
            ---
    b.负载均衡
        a.功能说明
            在集群环境中，可以通过负载均衡器连接多个Proxy节点，提高可用性和吞吐量。客户端连接到负载均衡器地址，请求会自动分发到后端Proxy。支持多种负载均衡策略，如轮询、最少连接等。Proxy节点故障时，负载均衡器会自动剔除故障节点。这种架构提供了更好的容错能力和扩展性。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 连接到负载均衡器
            connections.connect(
                alias="cluster",
                host="milvus-lb.example.com",  # 负载均衡器地址
                port="19530"
            )
            
            # Kubernetes环境下的负载均衡配置
            # apiVersion: v1
            # kind: Service
            # metadata:
            #   name: milvus-proxy-lb
            # spec:
            #   type: LoadBalancer
            #   selector:
            #     app: milvus-proxy
            #   ports:
            #     - protocol: TCP
            #       port: 19530
            #       targetPort: 19530
            
            # 使用DNS轮询（多个Proxy地址）
            # 配置DNS记录：
            # milvus.example.com -> 10.0.1.1
            # milvus.example.com -> 10.0.1.2
            # milvus.example.com -> 10.0.1.3
            
            connections.connect(
                alias="dns_lb",
                host="milvus.example.com",  # DNS会自动轮询
                port="19530"
            )
            
            # 客户端重试机制
            import time
            from pymilvus import connections, utility
            
            def connect_with_retry(max_retries=3, retry_delay=5):
                for attempt in range(max_retries):
                    try:
                        connections.connect(
                            alias="default",
                            host="milvus-lb.example.com",
                            port="19530",
                            timeout=10
                        )
                        version = utility.get_server_version()
                        print(f"连接成功，版本: {version}")
                        return True
                    except Exception as e:
                        print(f"连接失败 (尝试 {attempt + 1}/{max_retries}): {e}")
                        if attempt < max_retries - 1:
                            time.sleep(retry_delay)
                return False
            
            connect_with_retry()
            ---

2.3 基础操作

01.Collection操作
    a.创建Collection
        a.功能说明
            Collection是Milvus中的基本数据单元，类似于关系数据库中的表。创建Collection需要定义Schema，包括字段名称、数据类型、维度等。主键字段是必需的，可以设置为自动生成。向量字段需要指定维度，必须与后续插入的向量维度一致。创建后的Schema不可修改，需要谨慎设计。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            
            # 定义字段
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            # 创建Schema
            schema = CollectionSchema(
                fields=fields,
                description="文档向量库",
                enable_dynamic_field=False  # 是否允许动态字段
            )
            
            # 创建Collection
            collection = Collection(
                name="documents",
                schema=schema,
                using="default",
                shards_num=2  # 分片数量
            )
            
            print(f"Collection创建成功: {collection.name}")
            print(f"Schema: {collection.schema}")
            ---
    b.查看Collection
        a.功能说明
            可以列出所有Collection，查看Collection的详细信息，包括Schema定义、统计信息等。通过Collection对象可以获取实体数量、索引信息、加载状态等。这些信息有助于了解数据规模和系统状态。支持检查Collection是否存在，避免重复创建。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            
            # 列出所有Collection
            collections = utility.list_collections()
            print(f"所有Collection: {collections}")
            
            # 检查Collection是否存在
            has_collection = utility.has_collection("documents")
            print(f"Collection存在: {has_collection}")
            
            # 获取Collection对象
            collection = Collection("documents")
            
            # 查看Schema
            print(f"Schema: {collection.schema}")
            print(f"描述: {collection.description}")
            
            # 查看统计信息
            print(f"实体数量: {collection.num_entities}")
            
            # 查看索引信息
            indexes = collection.indexes
            for index in indexes:
                print(f"索引字段: {index.field_name}")
                print(f"索引类型: {index.params}")
            
            # 查看加载状态
            print(f"已加载: {utility.load_state('documents')}")
            
            # 查看Collection属性
            properties = collection.properties
            print(f"属性: {properties}")
            ---

02.数据插入
    a.批量插入
        a.功能说明
            数据插入以列式格式进行，每个字段对应一个列表。插入操作是原子的，要么全部成功要么全部失败。返回值包含插入的主键列表。建议批量插入，提高吞吐量，单次插入建议1000-10000条。插入后数据不会立即可见，需要等待刷新或自动刷新周期。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 准备数据（列式格式）
            ids = [i for i in range(1000)]
            titles = [f"文档{i}" for i in range(1000)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(1000)]
            
            # 插入数据
            data = [ids, titles, embeddings]
            insert_result = collection.insert(data)
            
            print(f"插入成功: {insert_result.insert_count} 条")
            print(f"主键列表: {insert_result.primary_keys[:10]}...")  # 显示前10个
            
            # 自动生成主键
            collection_auto = Collection("auto_id_collection")
            data_auto = [titles, embeddings]  # 不需要提供id
            insert_result = collection_auto.insert(data_auto)
            
            # 刷新数据（使数据立即可见）
            collection.flush()
            print(f"刷新后实体数量: {collection.num_entities}")
            ---
    b.单条插入
        a.功能说明
            虽然Milvus优化了批量插入，但也支持单条插入。单条插入适合实时数据流场景，每次插入一条记录。性能不如批量插入，但延迟更低。可以通过累积小批量来平衡吞吐量和延迟。建议在应用层实现缓冲机制，积累一定数量后批量插入。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 单条插入
            single_data = [
                [1001],  # id
                ["单条文档"],  # title
                [[np.random.random() for _ in range(128)]]  # embedding
            ]
            collection.insert(single_data)
            
            # 实时插入场景（带缓冲）
            class BufferedInserter:
                def __init__(self, collection, buffer_size=100):
                    self.collection = collection
                    self.buffer_size = buffer_size
                    self.buffer = {"ids": [], "titles": [], "embeddings": []}
                
                def insert(self, id, title, embedding):
                    self.buffer["ids"].append(id)
                    self.buffer["titles"].append(title)
                    self.buffer["embeddings"].append(embedding)
                    
                    if len(self.buffer["ids"]) >= self.buffer_size:
                        self.flush()
                
                def flush(self):
                    if len(self.buffer["ids"]) > 0:
                        data = [
                            self.buffer["ids"],
                            self.buffer["titles"],
                            self.buffer["embeddings"]
                        ]
                        self.collection.insert(data)
                        self.buffer = {"ids": [], "titles": [], "embeddings": []}
                        print(f"批量插入 {len(data[0])} 条数据")
            
            # 使用缓冲插入器
            inserter = BufferedInserter(collection, buffer_size=100)
            
            for i in range(250):
                inserter.insert(
                    id=2000 + i,
                    title=f"实时文档{i}",
                    embedding=[np.random.random() for _ in range(128)]
                )
            
            inserter.flush()  # 刷新剩余数据
            ---

03.数据查询
    a.主键查询
        a.功能说明
            通过主键精确查询实体，返回指定字段的值。主键查询是最快的查询方式，时间复杂度O(1)。支持批量主键查询，一次查询多个实体。可以指定返回的字段，减少数据传输量。主键查询不需要加载collection到内存，可以直接从存储层读取。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 单个主键查询
            results = collection.query(
                expr="id == 1",
                output_fields=["id", "title", "embedding"]
            )
            print(f"查询结果: {results}")
            
            # 批量主键查询
            ids_to_query = [1, 10, 100, 1000]
            results = collection.query(
                expr=f"id in {ids_to_query}",
                output_fields=["id", "title"]
            )
            for result in results:
                print(f"ID: {result['id']}, Title: {result['title']}")
            
            # 范围查询
            results = collection.query(
                expr="id > 100 and id < 200",
                output_fields=["id", "title"],
                limit=10
            )
            print(f"范围查询结果: {len(results)} 条")
            ---
    b.标量过滤
        a.功能说明
            支持对标量字段进行过滤查询，使用类SQL的表达式语法。支持比较运算符（==, !=, >, <, >=, <=）、逻辑运算符（and, or, not）、成员运算符（in, not in）。可以组合多个条件进行复杂查询。标量查询需要加载collection，或者对标量字段建立索引。查询性能取决于数据量和过滤条件的选择性。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            collection.load()  # 加载到内存
            
            # 字符串匹配
            results = collection.query(
                expr='title like "文档1%"',
                output_fields=["id", "title"],
                limit=10
            )
            
            # 多条件查询
            results = collection.query(
                expr='id > 100 and id < 500 and title like "文档%"',
                output_fields=["id", "title"]
            )
            
            # IN查询
            titles_to_find = ["文档1", "文档10", "文档100"]
            results = collection.query(
                expr=f'title in {titles_to_find}',
                output_fields=["id", "title"]
            )
            
            # 复杂表达式
            results = collection.query(
                expr='(id > 100 and id < 200) or (id > 800 and id < 900)',
                output_fields=["id", "title"],
                limit=20
            )
            
            # 分页查询
            page_size = 100
            offset = 0
            
            while True:
                results = collection.query(
                    expr="id > 0",
                    output_fields=["id", "title"],
                    limit=page_size,
                    offset=offset
                )
                
                if len(results) == 0:
                    break
                
                print(f"第 {offset // page_size + 1} 页: {len(results)} 条")
                offset += page_size
            ---

04.数据删除
    a.按表达式删除
        a.功能说明
            通过表达式删除满足条件的实体。删除操作是异步的，立即返回但数据可能不会立即删除。支持按主键、标量字段或组合条件删除。删除大量数据时建议分批进行，避免单次删除过多影响性能。删除后的空间不会立即释放，需要等待compaction操作。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 删除单条记录
            expr = "id == 1001"
            collection.delete(expr)
            
            # 批量删除
            ids_to_delete = [1, 2, 3, 4, 5]
            expr = f"id in {ids_to_delete}"
            collection.delete(expr)
            
            # 条件删除
            expr = "id > 2000 and id < 2100"
            collection.delete(expr)
            
            # 删除所有数据（慎用）
            # expr = "id > 0"
            # collection.delete(expr)
            
            # 分批删除大量数据
            batch_size = 1000
            start_id = 3000
            end_id = 10000
            
            for i in range(start_id, end_id, batch_size):
                expr = f"id >= {i} and id < {i + batch_size}"
                collection.delete(expr)
                print(f"已删除 ID {i} 到 {i + batch_size}")
            
            # 刷新删除操作
            collection.flush()
            print(f"删除后实体数量: {collection.num_entities}")
            ---
    b.Compaction压缩
        a.功能说明
            Compaction是Milvus的后台维护操作，用于合并小segment和清理已删除的数据。删除操作只是标记删除，实际空间通过compaction释放。Compaction会重组数据，提高查询性能。可以手动触发compaction，也可以等待自动执行。Compaction过程中collection仍可正常使用，但可能影响性能。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 手动触发compaction
            collection.compact()
            print("Compaction已触发")
            
            # 等待compaction完成
            while True:
                state = utility.get_compaction_state(collection.name)
                if state.state == 3:  # 3表示完成
                    print("Compaction完成")
                    break
                print(f"Compaction进行中: {state.executing_plan_no}/{state.total_plan_no}")
                time.sleep(1)
            
            # 查看compaction计划
            plans = utility.get_compaction_plans(collection.name)
            for plan in plans:
                print(f"计划ID: {plan.id}, 源segment: {plan.sources}, 目标segment: {plan.target}")
            
            # 配置自动compaction（milvus.yaml）
            # dataCoord:
            #   enableCompaction: true
            #   enableAutoCompaction: true
            #   compaction:
            #     min:
            #       interval: 60  # 最小间隔（秒）
            #     max:
            #       interval: 3600  # 最大间隔（秒）
            
            # 查看segment信息
            segments = utility.get_query_segment_info(collection.name)
            total_size = sum(seg.num_rows for seg in segments)
            print(f"总segment数: {len(segments)}, 总行数: {total_size}")
            
            for seg in segments[:5]:  # 显示前5个segment
                print(f"Segment {seg.segmentID}: {seg.num_rows} rows, state={seg.state}")
            ---

3 Collection管理

3.1 Schema定义

01.字段类型
    a.标量字段
        a.功能说明
            Milvus支持多种标量数据类型，包括整数（INT8, INT16, INT32, INT64）、浮点数（FLOAT, DOUBLE）、布尔值（BOOL）、字符串（VARCHAR）和JSON。标量字段用于存储元数据和过滤条件。VARCHAR类型需要指定最大长度。JSON类型支持嵌套结构，可以存储复杂的元数据。标量字段可以建立索引，加速过滤查询。
        b.代码示例
            ---
            from pymilvus import FieldSchema, DataType
            
            # 整数类型
            id_field = FieldSchema(
                name="id",
                dtype=DataType.INT64,
                is_primary=True,
                auto_id=False
            )
            
            age_field = FieldSchema(
                name="age",
                dtype=DataType.INT32
            )
            
            # 浮点数类型
            score_field = FieldSchema(
                name="score",
                dtype=DataType.FLOAT
            )
            
            # 布尔类型
            active_field = FieldSchema(
                name="is_active",
                dtype=DataType.BOOL
            )
            
            # 字符串类型
            title_field = FieldSchema(
                name="title",
                dtype=DataType.VARCHAR,
                max_length=500
            )
            
            # JSON类型
            metadata_field = FieldSchema(
                name="metadata",
                dtype=DataType.JSON
            )
            
            # 所有标量类型示例
            fields = [
                id_field,
                age_field,
                score_field,
                active_field,
                title_field,
                metadata_field
            ]
            ---
    b.向量字段
        a.功能说明
            向量字段存储高维向量数据，是Milvus的核心字段类型。支持FLOAT_VECTOR（浮点向量）、BINARY_VECTOR（二值向量）和FLOAT16_VECTOR（半精度向量）。必须指定向量维度，维度在创建后不可修改。一个collection可以包含多个向量字段，支持多模态检索。向量字段必须建立索引才能进行相似度搜索。
        b.代码示例
            ---
            from pymilvus import FieldSchema, DataType
            
            # 浮点向量（最常用）
            embedding_field = FieldSchema(
                name="embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=128  # 向量维度
            )
            
            # 高维向量
            high_dim_field = FieldSchema(
                name="high_dim_embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=1536  # OpenAI ada-002维度
            )
            
            # 二值向量（节省存储空间）
            binary_field = FieldSchema(
                name="binary_embedding",
                dtype=DataType.BINARY_VECTOR,
                dim=512  # 维度必须是8的倍数
            )
            
            # 半精度向量（节省内存）
            fp16_field = FieldSchema(
                name="fp16_embedding",
                dtype=DataType.FLOAT16_VECTOR,
                dim=256
            )
            
            # 多向量字段（多模态）
            text_vector = FieldSchema(
                name="text_embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=768
            )
            
            image_vector = FieldSchema(
                name="image_embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=512
            )
            
            # 向量字段集合
            vector_fields = [
                embedding_field,
                high_dim_field,
                binary_field,
                fp16_field,
                text_vector,
                image_vector
            ]
            ---

02.Schema配置
    a.基本Schema
        a.功能说明
            Schema定义了collection的结构，包括所有字段的定义。必须包含一个主键字段，可以设置为自动生成。可以添加描述信息，便于理解collection用途。Schema创建后不可修改，需要谨慎设计。建议在设计阶段充分考虑业务需求和扩展性。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            # 定义字段
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),
                FieldSchema(name="timestamp", dtype=DataType.INT64),
                FieldSchema(name="score", dtype=DataType.FLOAT),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768)
            ]
            
            # 创建Schema
            schema = CollectionSchema(
                fields=fields,
                description="文档搜索系统",
                enable_dynamic_field=False
            )
            
            # 查看Schema信息
            print(f"字段数量: {len(schema.fields)}")
            for field in schema.fields:
                print(f"字段: {field.name}, 类型: {field.dtype}, 主键: {field.is_primary}")
            
            # Schema验证
            print(f"主键字段: {schema.primary_field.name}")
            print(f"自动ID: {schema.auto_id}")
            ---
    b.动态Schema
        a.功能说明
            动态Schema允许插入未在Schema中定义的字段，提供更大的灵活性。动态字段会自动推断类型，存储在内部的JSON字段中。适合元数据结构不固定的场景，如用户自定义属性。动态字段可以用于过滤查询，但性能不如预定义字段。启用动态Schema会增加一定的存储开销。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
            
            # 启用动态Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(
                fields=fields,
                description="动态Schema示例",
                enable_dynamic_field=True  # 启用动态字段
            )
            
            collection = Collection("dynamic_collection", schema=schema)
            
            # 插入数据（包含动态字段）
            data = [
                [1, 2, 3],  # id
                [[0.1]*128, [0.2]*128, [0.3]*128],  # embedding
                ["标题1", "标题2", "标题3"],  # 动态字段: title
                [100, 200, 300],  # 动态字段: score
                [{"tag": "AI"}, {"tag": "ML"}, {"tag": "DL"}]  # 动态字段: metadata
            ]
            
            # 注意：动态字段需要在插入时指定字段名
            collection.insert(data, fields=["id", "embedding", "title", "score", "metadata"])
            
            # 查询动态字段
            collection.load()
            results = collection.query(
                expr="id > 0",
                output_fields=["id", "title", "score", "metadata"]
            )
            
            for result in results:
                print(f"ID: {result['id']}, Title: {result.get('title')}, Score: {result.get('score')}")
            ---

03.主键设计
    a.自增主键
        a.功能说明
            自增主键由Milvus自动生成，保证全局唯一。使用雪花算法生成64位整数ID，包含时间戳和节点信息。自增主键简化了数据插入流程，无需应用层维护ID。适合不需要自定义ID的场景。自增ID是递增的，但不保证连续。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
            import numpy as np
            
            # 定义自增主键Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(fields=fields, description="自增ID示例")
            collection = Collection("auto_id_collection", schema=schema)
            
            # 插入数据（不需要提供id）
            texts = [f"文本{i}" for i in range(100)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(100)]
            
            data = [texts, embeddings]  # 注意：没有id字段
            insert_result = collection.insert(data)
            
            # 获取自动生成的ID
            generated_ids = insert_result.primary_keys
            print(f"生成的ID: {generated_ids[:10]}")
            
            # 使用生成的ID查询
            results = collection.query(
                expr=f"id in {generated_ids[:5]}",
                output_fields=["id", "text"]
            )
            
            for result in results:
                print(f"ID: {result['id']}, Text: {result['text']}")
            ---
    b.自定义主键
        a.功能说明
            自定义主键由应用层提供，可以使用业务ID或UUID。需要保证主键的全局唯一性，重复插入会报错。自定义主键便于与现有系统集成，可以直接使用业务ID查询。支持INT64和VARCHAR类型的主键。VARCHAR主键最大长度为65535字符。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
            import uuid
            import numpy as np
            
            # INT64自定义主键
            fields_int = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema_int = CollectionSchema(fields=fields_int, description="INT64主键")
            collection_int = Collection("custom_int_id", schema=schema_int)
            
            # 插入数据（提供自定义ID）
            ids = [1000 + i for i in range(100)]
            texts = [f"文本{i}" for i in range(100)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(100)]
            
            data = [ids, texts, embeddings]
            collection_int.insert(data)
            
            # VARCHAR主键（UUID）
            fields_str = [
                FieldSchema(name="id", dtype=DataType.VARCHAR, max_length=36, is_primary=True, auto_id=False),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema_str = CollectionSchema(fields=fields_str, description="VARCHAR主键")
            collection_str = Collection("custom_str_id", schema=schema_str)
            
            # 使用UUID作为主键
            uuids = [str(uuid.uuid4()) for _ in range(100)]
            data = [uuids, texts, embeddings]
            collection_str.insert(data)
            
            # 使用UUID查询
            results = collection_str.query(
                expr=f'id == "{uuids[0]}"',
                output_fields=["id", "text"]
            )
            print(f"UUID查询结果: {results[0]}")
            
            # 业务ID示例（如订单号）
            order_ids = [f"ORDER{i:08d}" for i in range(100)]
            data = [order_ids, texts, embeddings]
            collection_str.insert(data)
            ---

04.Schema最佳实践
    a.字段选择
        a.功能说明
            合理选择字段类型可以优化存储和性能。只包含必要的字段，避免冗余数据。VARCHAR字段设置合理的最大长度，过大会浪费存储空间。对于高频过滤的字段，建议建立标量索引。JSON字段适合存储非结构化元数据，但查询性能不如预定义字段。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            # 优化前：字段过多，类型不合理
            fields_bad = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=10000),  # 过大
                FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=50000),  # 过大
                FieldSchema(name="author", dtype=DataType.VARCHAR, max_length=5000),  # 过大
                FieldSchema(name="tags", dtype=DataType.VARCHAR, max_length=10000),  # 应该用JSON
                FieldSchema(name="metadata", dtype=DataType.VARCHAR, max_length=10000),  # 应该用JSON
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            # 优化后：字段精简，类型合理
            fields_good = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),  # 合理长度
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),  # 用于过滤
                FieldSchema(name="timestamp", dtype=DataType.INT64),  # 时间戳（便于范围查询）
                FieldSchema(name="metadata", dtype=DataType.JSON),  # 灵活的元数据
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema_good = CollectionSchema(
                fields=fields_good,
                description="优化的Schema设计"
            )
            
            # 字段索引策略
            # 1. 主键自动索引
            # 2. 向量字段必须建索引
            # 3. 高频过滤字段建标量索引
            # 4. JSON字段不建索引（性能考虑）
            ---
    b.版本管理
        a.功能说明
            Schema一旦创建就不可修改，需要做好版本管理。可以通过collection名称包含版本号来管理不同版本。数据迁移时，创建新collection并逐步迁移数据。使用别名机制，应用层无需感知collection变化。建议在开发阶段充分测试Schema设计，避免频繁变更。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            
            # Schema版本管理策略
            
            # V1版本
            fields_v1 = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema_v1 = CollectionSchema(fields=fields_v1, description="V1版本")
            collection_v1 = Collection("documents_v1", schema=schema_v1)
            
            # V2版本（增加字段）
            fields_v2 = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),  # 新增
                FieldSchema(name="timestamp", dtype=DataType.INT64),  # 新增
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=256)  # 维度变化
            ]
            schema_v2 = CollectionSchema(fields=fields_v2, description="V2版本")
            collection_v2 = Collection("documents_v2", schema=schema_v2)
            
            # 数据迁移函数
            def migrate_data(source_collection, target_collection, batch_size=1000):
                source_collection.load()
                offset = 0
                
                while True:
                    # 从源collection读取数据
                    results = source_collection.query(
                        expr="id > 0",
                        output_fields=["id", "text", "embedding"],
                        limit=batch_size,
                        offset=offset
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 转换数据格式
                    ids = [r["id"] for r in results]
                    texts = [r["text"] for r in results]
                    # 假设有函数将128维向量升级到256维
                    embeddings = [upgrade_embedding(r["embedding"]) for r in results]
                    # 填充新字段
                    categories = ["default"] * len(results)
                    timestamps = [int(time.time())] * len(results)
                    
                    # 插入到目标collection
                    data = [ids, texts, categories, timestamps, embeddings]
                    target_collection.insert(data)
                    
                    offset += batch_size
                    print(f"已迁移 {offset} 条数据")
                
                target_collection.flush()
            
            # 使用别名进行平滑切换
            utility.create_alias(collection_name="documents_v1", alias="documents")
            
            # 迁移完成后切换别名
            # utility.alter_alias(collection_name="documents_v2", alias="documents")
            
            # 应用层代码不变
            collection = Collection("documents")  # 通过别名访问
            ---

3.2 创建Collection

01.Collection创建方法
    a.基本创建
        a.功能说明
            创建Collection需要提供名称和Schema定义。Collection名称必须唯一，不能与已存在的collection重复。可以指定分片数量，影响并行查询性能。创建后立即返回Collection对象，但不会自动加载到内存。建议在创建后立即创建索引，避免后续数据插入时的性能问题。Collection名称支持字母、数字和下划线，长度不超过255字符。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            
            # 定义Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields, description="文档集合")
            
            # 创建Collection
            collection = Collection(
                name="documents",
                schema=schema,
                using="default",
                shards_num=2
            )
            
            print(f"Collection创建成功: {collection.name}")
            print(f"分片数量: {collection.shards_num}")
            print(f"Schema: {collection.schema}")
            
            # 验证创建
            from pymilvus import utility
            assert utility.has_collection("documents")
            ---
    b.从已有Collection创建
        a.功能说明
            可以通过Collection名称获取已存在的collection对象。这种方式不会重新创建collection，只是获取引用。适合在不同模块或进程中访问同一个collection。如果collection不存在会抛出异常，可以先检查是否存在。获取的Collection对象与原对象共享相同的元数据和数据。多个Collection对象可以指向同一个collection，修改会互相影响。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            # 检查Collection是否存在
            if utility.has_collection("documents"):
                # 获取已存在的Collection
                collection = Collection("documents")
                print(f"获取Collection: {collection.name}")
                print(f"实体数量: {collection.num_entities}")
                print(f"Schema: {collection.schema}")
            else:
                print("Collection不存在")
            
            # 安全获取Collection
            def get_or_create_collection(name, schema, shards_num=2):
                if utility.has_collection(name):
                    return Collection(name)
                else:
                    return Collection(name, schema=schema, shards_num=shards_num)
            
            collection = get_or_create_collection("documents", schema)
            
            # 多个引用示例
            collection1 = Collection("documents")
            collection2 = Collection("documents")
            
            # 两个对象指向同一个collection
            print(f"相同collection: {collection1.name == collection2.name}")
            ---

02.Collection配置
    a.分片配置
        a.功能说明
            分片数量决定了数据的分布和并行度。更多分片可以提高查询并发性能，但也会增加管理开销。建议根据数据量和查询负载设置分片数。单机环境建议1-2个分片，集群环境可以设置更多。分片数量在创建后不可修改，需要谨慎选择。每个分片会独立管理一部分数据，查询时会并行处理所有分片。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema
            
            # 单分片（小数据量，<100万）
            collection_small = Collection(
                name="small_collection",
                schema=schema,
                shards_num=1
            )
            
            # 多分片（大数据量，>1000万）
            collection_large = Collection(
                name="large_collection",
                schema=schema,
                shards_num=4
            )
            
            # 根据数据量动态选择分片数
            def calculate_shards(estimated_entities):
                if estimated_entities < 1000000:
                    return 1
                elif estimated_entities < 10000000:
                    return 2
                elif estimated_entities < 100000000:
                    return 4
                else:
                    return 8
            
            shards = calculate_shards(5000000)
            collection = Collection(
                name="dynamic_shards",
                schema=schema,
                shards_num=shards
            )
            
            print(f"数据量: 5000000, 分片数: {shards}")
            
            # 查看分片信息
            print(f"Collection分片数: {collection.shards_num}")
            ---
    b.属性配置
        a.功能说明
            Collection支持设置多种属性，如TTL（数据过期时间）、副本数量等。TTL可以自动清理过期数据，适合时效性数据。副本数量影响查询性能和可用性，更多副本可以提高查询吞吐量。属性可以在创建后修改，提供灵活的配置能力。TTL以秒为单位，0表示永不过期。副本数量建议设置为2-3，过多会增加存储开销。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 设置TTL（秒）
            collection.set_properties(properties={"collection.ttl.seconds": 86400})  # 1天
            print("TTL设置为1天")
            
            # 设置副本数量
            collection.set_properties(properties={"collection.replica.number": 2})
            print("副本数量设置为2")
            
            # 查看属性
            properties = collection.properties
            print(f"Collection属性: {properties}")
            
            # 批量设置属性
            collection.set_properties(properties={
                "collection.ttl.seconds": 172800,  # 2天
                "collection.replica.number": 3
            })
            
            # 删除TTL（永不过期）
            collection.set_properties(properties={"collection.ttl.seconds": 0})
            print("TTL已禁用")
            
            # 常用属性配置
            # 1. 缓存数据（短期）
            cache_collection = Collection("cache")
            cache_collection.set_properties(properties={"collection.ttl.seconds": 3600})  # 1小时
            
            # 2. 日志数据（中期）
            log_collection = Collection("logs")
            log_collection.set_properties(properties={"collection.ttl.seconds": 604800})  # 7天
            
            # 3. 持久数据（长期）
            persistent_collection = Collection("persistent")
            persistent_collection.set_properties(properties={"collection.ttl.seconds": 0})  # 永久
            ---

03.别名管理
    a.创建别名
        a.功能说明
            别名是collection的另一个名称，可以用于平滑升级和版本管理。一个collection可以有多个别名，一个别名只能指向一个collection。通过别名访问collection，应用层无需感知实际的collection名称。适合在数据迁移或Schema变更时使用。别名操作是原子的，切换过程中不会影响服务。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            
            # 创建别名
            utility.create_alias(
                collection_name="documents_v1",
                alias="documents"
            )
            print("别名创建成功")
            
            # 通过别名访问
            collection = Collection("documents")  # 实际访问documents_v1
            print(f"实际collection: {collection.name}")
            
            # 查看别名列表
            aliases = utility.list_aliases("documents_v1")
            print(f"别名列表: {aliases}")
            
            # 一个collection多个别名
            utility.create_alias("documents_v1", "docs")
            utility.create_alias("documents_v1", "doc_collection")
            
            # 所有别名都指向同一个collection
            col1 = Collection("documents")
            col2 = Collection("docs")
            col3 = Collection("doc_collection")
            
            print(f"实体数量一致: {col1.num_entities == col2.num_entities == col3.num_entities}")
            ---
    b.切换别名
        a.功能说明
            可以将别名切换到另一个collection，实现平滑升级。切换操作是原子的，不会出现中间状态。适合在新旧版本切换时使用，应用层无需修改代码。切换前建议先验证新collection的数据完整性。可以通过别名实现蓝绿部署和灰度发布。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            
            # 初始状态：别名指向v1
            utility.create_alias("documents_v1", "documents")
            
            # 创建新版本collection
            collection_v2 = Collection("documents_v2", schema=new_schema)
            # ... 迁移数据到v2 ...
            
            # 切换别名到v2
            utility.alter_alias(
                collection_name="documents_v2",
                alias="documents"
            )
            print("别名已切换到v2")
            
            # 现在通过别名访问的是v2
            collection = Collection("documents")
            print(f"当前版本: {collection.name}")
            
            # 蓝绿部署示例
            def blue_green_deployment(old_collection, new_collection, alias):
                # 1. 验证新collection
                new_col = Collection(new_collection)
                assert new_col.num_entities > 0, "新collection数据为空"
                
                # 2. 切换别名
                utility.alter_alias(
                    collection_name=new_collection,
                    alias=alias
                )
                print(f"已切换到新版本: {new_collection}")
                
                # 3. 保留旧版本一段时间，以便回滚
                # 如果需要回滚
                # utility.alter_alias(collection_name=old_collection, alias=alias)
            
            blue_green_deployment("documents_v1", "documents_v2", "documents")
            
            # 删除别名
            utility.drop_alias("documents")
            print("别名已删除")
            ---

04.Collection元数据
    a.查看元数据
        a.功能说明
            Collection包含丰富的元数据信息，包括Schema定义、统计信息、索引信息等。通过元数据可以了解collection的结构和状态。元数据查询不需要加载collection，性能开销小。可以用于监控和管理collection。元数据会实时更新，反映collection的最新状态。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            collection = Collection("documents")
            
            # Schema信息
            print(f"Collection名称: {collection.name}")
            print(f"描述: {collection.description}")
            print(f"Schema: {collection.schema}")
            
            # 字段信息
            for field in collection.schema.fields:
                print(f"字段: {field.name}")
                print(f"  类型: {field.dtype}")
                print(f"  主键: {field.is_primary}")
                if field.dtype == DataType.FLOAT_VECTOR:
                    print(f"  维度: {field.params.get('dim')}")
                if field.dtype == DataType.VARCHAR:
                    print(f"  最大长度: {field.params.get('max_length')}")
            
            # 统计信息
            print(f"实体数量: {collection.num_entities}")
            print(f"分片数量: {collection.shards_num}")
            
            # 索引信息
            indexes = collection.indexes
            for index in indexes:
                print(f"索引字段: {index.field_name}")
                print(f"索引参数: {index.params}")
            
            # 加载状态
            load_state = utility.load_state("documents")
            print(f"加载状态: {load_state}")
            
            # 属性信息
            properties = collection.properties
            print(f"属性: {properties}")
            ---
    b.监控统计
        a.功能说明
            可以通过元数据监控collection的使用情况和性能指标。统计信息包括实体数量、segment信息、内存占用等。定期监控可以及时发现问题，如数据倾斜、内存不足等。可以基于统计信息进行容量规划和性能优化。Milvus提供了丰富的监控API和指标。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 监控函数
            def monitor_collection(collection_name, interval=60):
                while True:
                    collection = Collection(collection_name)
                    
                    # 基本统计
                    print(f"\\n=== {time.strftime('%Y-%m-%d %H:%M:%S')} ===")
                    print(f"实体数量: {collection.num_entities:,}")
                    
                    # Segment信息
                    segments = utility.get_query_segment_info(collection_name)
                    print(f"Segment数量: {len(segments)}")
                    
                    total_rows = sum(seg.num_rows for seg in segments)
                    print(f"总行数: {total_rows:,}")
                    
                    # 按状态分组
                    state_counts = {}
                    for seg in segments:
                        state = seg.state
                        state_counts[state] = state_counts.get(state, 0) + 1
                    print(f"Segment状态: {state_counts}")
                    
                    # 内存占用（需要collection已加载）
                    if utility.load_state(collection_name) == utility.LoadState.Loaded:
                        # 估算内存占用
                        vector_dim = 128
                        vector_size = total_rows * vector_dim * 4  # float32
                        print(f"估算向量内存: {vector_size / 1024 / 1024:.2f} MB")
                    
                    time.sleep(interval)
            
            # 启动监控（在后台线程中运行）
            import threading
            monitor_thread = threading.Thread(
                target=monitor_collection,
                args=("documents", 60),
                daemon=True
            )
            monitor_thread.start()
            
            # 性能指标收集
            def collect_metrics(collection_name):
                collection = Collection(collection_name)
                
                metrics = {
                    "name": collection_name,
                    "entities": collection.num_entities,
                    "shards": collection.shards_num,
                    "load_state": str(utility.load_state(collection_name)),
                    "timestamp": time.time()
                }
                
                # 添加segment信息
                segments = utility.get_query_segment_info(collection_name)
                metrics["segments"] = len(segments)
                metrics["total_rows"] = sum(seg.num_rows for seg in segments)
                
                return metrics
            
            metrics = collect_metrics("documents")
            print(f"指标: {metrics}")
            ---

3.3 加载和释放

01.加载Collection
    a.加载到内存
        a.功能说明
            Collection创建后默认不加载到内存，需要显式调用load方法。加载后数据和索引会被加载到Query Node的内存中，才能进行搜索查询。加载是异步操作，可以通过load_state查看加载进度。大型collection加载可能需要较长时间，建议在低峰期进行。加载后会占用内存资源，需要根据服务器配置合理规划。加载过程会读取所有segment和索引文件，网络和磁盘IO是主要瓶颈。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 加载Collection
            print("开始加载Collection...")
            collection.load()
            
            # 等待加载完成
            while True:
                state = utility.load_state("documents")
                if state == utility.LoadState.Loaded:
                    print("加载完成")
                    break
                elif state == utility.LoadState.Loading:
                    print("加载中...")
                    time.sleep(1)
                elif state == utility.LoadState.NotLoad:
                    print("未加载")
                    break
                else:
                    print(f"加载状态: {state}")
                    break
            
            # 查看加载状态
            print(f"当前状态: {utility.load_state('documents')}")
            
            # 加载时指定副本数量
            collection.load(replica_number=2)
            print("已加载2个副本")
            
            # 加载进度监控
            def monitor_load_progress(collection_name, check_interval=1):
                start_time = time.time()
                
                while True:
                    state = utility.load_state(collection_name)
                    elapsed = time.time() - start_time
                    
                    if state == utility.LoadState.Loaded:
                        print(f"加载完成，耗时: {elapsed:.2f}秒")
                        break
                    elif state == utility.LoadState.Loading:
                        print(f"加载中... 已耗时: {elapsed:.2f}秒")
                        time.sleep(check_interval)
                    else:
                        print(f"加载异常: {state}")
                        break
            
            monitor_load_progress("documents")
            ---
    b.分区加载
        a.功能说明
            可以只加载部分分区到内存，节省资源。适合数据按时间或类别分区的场景，只加载热数据分区。分区加载可以显著减少内存占用，提高加载速度。查询时只能查询已加载的分区，未加载分区的数据不可见。可以动态加载和释放分区，实现冷热数据分离。分区加载特别适合时间序列数据，如日志、监控数据等。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            
            collection = Collection("documents")
            
            # 创建分区
            partition_2024 = Partition(collection, "2024")
            partition_2025 = Partition(collection, "2025")
            partition_2026 = Partition(collection, "2026")
            
            # 只加载2026分区（最新数据）
            partition_2026.load()
            print("已加载2026分区")
            
            # 查询只在已加载分区中进行
            results = collection.search(
                data=[[0.1]*128],
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 10}},
                limit=10,
                partition_names=["2026"]
            )
            print(f"搜索结果: {len(results[0])} 条")
            
            # 加载多个分区
            collection.load(partition_names=["2025", "2026"])
            print("已加载2025和2026分区")
            
            # 动态分区管理
            def load_recent_partitions(collection, months=3):
                from datetime import datetime, timedelta
                
                # 计算需要加载的分区
                current_date = datetime.now()
                partitions_to_load = []
                
                for i in range(months):
                    date = current_date - timedelta(days=30*i)
                    partition_name = date.strftime("%Y%m")
                    partitions_to_load.append(partition_name)
                
                # 加载分区
                collection.load(partition_names=partitions_to_load)
                print(f"已加载最近{months}个月的分区: {partitions_to_load}")
            
            load_recent_partitions(collection, months=3)
            
            # 释放特定分区
            partition_2024.release()
            print("已释放2024分区")
            
            # 查看分区加载状态
            for partition in collection.partitions:
                state = utility.load_state("documents", partition.name)
                print(f"分区 {partition.name}: {state}")
            ---

02.释放Collection
    a.释放内存
        a.功能说明
            释放操作会将collection从内存中卸载，释放Query Node的内存资源。释放后无法进行搜索查询，但数据仍然保存在存储层。适合临时使用的collection或需要释放内存的场景。释放是异步操作，立即返回但可能需要时间完成。释放后可以重新加载，不影响数据完整性。释放操作不会删除数据，只是从内存中移除。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 释放Collection
            print("开始释放Collection...")
            collection.release()
            
            # 等待释放完成
            time.sleep(1)
            state = utility.load_state("documents")
            print(f"释放后状态: {state}")
            
            # 验证释放
            assert state == utility.LoadState.NotLoad, "释放失败"
            
            # 释放特定分区
            from pymilvus import Partition
            partition = Partition(collection, "2024")
            partition.release()
            print("已释放2024分区")
            
            # 释放所有分区
            collection.release()
            print("已释放所有分区")
            
            # 重新加载
            collection.load()
            print("已重新加载")
            
            # 释放前检查
            def safe_release(collection_name):
                state = utility.load_state(collection_name)
                
                if state == utility.LoadState.Loaded:
                    collection = Collection(collection_name)
                    collection.release()
                    print(f"已释放: {collection_name}")
                    return True
                elif state == utility.LoadState.NotLoad:
                    print(f"未加载，无需释放: {collection_name}")
                    return True
                else:
                    print(f"状态异常: {state}")
                    return False
            
            safe_release("documents")
            ---
    b.内存管理
        a.功能说明
            合理管理collection的加载和释放可以优化内存使用。建议只加载活跃使用的collection，定期释放不活跃的collection。可以通过监控内存使用情况，动态调整加载策略。使用分区加载可以更细粒度地控制内存占用。在内存不足时，系统可能会自动释放部分collection。实现LRU缓存策略可以自动管理collection的加载和释放。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import psutil
            import time
            from collections import OrderedDict
            
            def get_memory_usage():
                """获取当前内存使用量（MB）"""
                process = psutil.Process()
                return process.memory_info().rss / 1024 / 1024
            
            # LRU Collection管理器
            class CollectionManager:
                def __init__(self, max_memory_mb=8192, max_collections=5):
                    self.max_memory_mb = max_memory_mb
                    self.max_collections = max_collections
                    self.loaded_collections = OrderedDict()
                    self.access_count = {}
                
                def load_collection(self, collection_name):
                    # 如果已加载，更新访问时间
                    if collection_name in self.loaded_collections:
                        self.loaded_collections.move_to_end(collection_name)
                        self.access_count[collection_name] += 1
                        return
                    
                    # 检查内存使用
                    current_memory = get_memory_usage()
                    
                    # 内存不足或collection数量超限，释放最久未使用的
                    while (current_memory > self.max_memory_mb * 0.8 or 
                           len(self.loaded_collections) >= self.max_collections):
                        if not self.loaded_collections:
                            break
                        
                        old_name, _ = self.loaded_collections.popitem(last=False)
                        Collection(old_name).release()
                        print(f"释放Collection: {old_name}")
                        
                        time.sleep(0.5)
                        current_memory = get_memory_usage()
                    
                    # 加载新collection
                    collection = Collection(collection_name)
                    collection.load()
                    self.loaded_collections[collection_name] = time.time()
                    self.access_count[collection_name] = 1
                    print(f"加载Collection: {collection_name}")
                
                def release_all(self):
                    """释放所有collection"""
                    for name in list(self.loaded_collections.keys()):
                        Collection(name).release()
                    self.loaded_collections.clear()
                    self.access_count.clear()
                    print("已释放所有collection")
                
                def get_stats(self):
                    """获取统计信息"""
                    return {
                        "loaded_count": len(self.loaded_collections),
                        "memory_mb": get_memory_usage(),
                        "collections": list(self.loaded_collections.keys()),
                        "access_count": self.access_count
                    }
            
            # 使用管理器
            manager = CollectionManager(max_memory_mb=8192, max_collections=3)
            
            # 模拟访问
            manager.load_collection("documents")
            manager.load_collection("images")
            manager.load_collection("videos")
            
            # 访问已加载的collection
            manager.load_collection("documents")  # 更新访问时间
            
            # 加载新collection（会触发释放）
            manager.load_collection("audio")
            
            # 查看统计
            stats = manager.get_stats()
            print(f"统计信息: {stats}")
            
            # 定期清理
            def periodic_cleanup(manager, interval=300):
                """定期清理不活跃的collection"""
                while True:
                    time.sleep(interval)
                    
                    current_time = time.time()
                    to_release = []
                    
                    for name, load_time in manager.loaded_collections.items():
                        # 超过5分钟未访问
                        if current_time - load_time > 300:
                            to_release.append(name)
                    
                    for name in to_release:
                        Collection(name).release()
                        del manager.loaded_collections[name]
                        print(f"清理不活跃collection: {name}")
            
            # 启动定期清理（后台线程）
            import threading
            cleanup_thread = threading.Thread(
                target=periodic_cleanup,
                args=(manager, 300),
                daemon=True
            )
            cleanup_thread.start()
            ---

03.副本管理
    a.副本配置
        a.功能说明
            副本是collection的完整内存拷贝，用于提高查询吞吐量和可用性。多个副本可以并行处理查询请求，提高并发性能。副本数量在加载时指定，可以动态调整。每个副本会占用相同的内存空间，需要考虑资源限制。副本会自动分布到不同的Query Node，实现负载均衡。副本故障时会自动切换到其他副本，保证服务可用性。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            collection = Collection("documents")
            
            # 加载时指定副本数量
            collection.load(replica_number=2)
            print("已加载2个副本")
            
            # 查看副本信息
            replicas = collection.get_replicas()
            print(f"副本数量: {len(replicas.groups)}")
            
            for i, replica in enumerate(replicas.groups):
                print(f"副本 {i}:")
                print(f"  副本ID: {replica.id}")
                print(f"  分片副本: {replica.shards}")
                print(f"  节点: {replica.nodes}")
            
            # 动态调整副本数量
            collection.release()
            collection.load(replica_number=3)
            print("副本数量已调整为3")
            
            # 副本负载均衡测试
            import concurrent.futures
            import time
            
            def query_task(task_id):
                start = time.time()
                results = collection.search(
                    data=[[0.1]*128],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 10}},
                    limit=10
                )
                elapsed = time.time() - start
                return elapsed
            
            # 100个并发查询
            with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
                futures = [executor.submit(query_task, i) for i in range(100)]
                times = [f.result() for f in futures]
            
            avg_time = sum(times) / len(times)
            print(f"平均查询时间: {avg_time*1000:.2f}ms")
            print(f"QPS: {len(times) / sum(times):.2f}")
            ---
    b.副本监控
        a.功能说明
            可以监控副本的状态和负载分布，确保系统正常运行。副本信息包括副本ID、所在节点、分片分布等。通过监控可以发现副本不均衡、节点故障等问题。Milvus会自动管理副本的分布和故障转移。建议定期检查副本状态，及时发现和处理异常。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            collection.load(replica_number=2)
            
            # 副本监控函数
            def monitor_replicas(collection_name, interval=60):
                while True:
                    collection = Collection(collection_name)
                    replicas = collection.get_replicas()
                    
                    print(f"\n=== {time.strftime('%Y-%m-%d %H:%M:%S')} ===")
                    print(f"副本数量: {len(replicas.groups)}")
                    
                    for i, replica in enumerate(replicas.groups):
                        print(f"\n副本 {i}:")
                        print(f"  ID: {replica.id}")
                        print(f"  分片数: {len(replica.shards)}")
                        print(f"  节点数: {len(replica.nodes)}")
                        
                        # 分片信息
                        for shard in replica.shards:
                            print(f"  分片 {shard.shard_id}:")
                            print(f"    通道: {shard.channel_name}")
                            print(f"    节点: {shard.node_ids}")
                    
                    # 检查副本分布
                    all_nodes = set()
                    for replica in replicas.groups:
                        all_nodes.update(replica.nodes)
                    
                    print(f"\n总节点数: {len(all_nodes)}")
                    print(f"节点列表: {all_nodes}")
                    
                    # 检查负载均衡
                    node_replica_count = {}
                    for replica in replicas.groups:
                        for node in replica.nodes:
                            node_replica_count[node] = node_replica_count.get(node, 0) + 1
                    
                    print(f"节点副本分布: {node_replica_count}")
                    
                    time.sleep(interval)
            
            # 启动监控
            import threading
            monitor_thread = threading.Thread(
                target=monitor_replicas,
                args=("documents", 60),
                daemon=True
            )
            monitor_thread.start()
            
            # 副本健康检查
            def check_replica_health(collection_name):
                collection = Collection(collection_name)
                replicas = collection.get_replicas()
                
                if len(replicas.groups) == 0:
                    return False, "没有副本"
                
                # 检查每个副本
                for replica in replicas.groups:
                    if len(replica.nodes) == 0:
                        return False, f"副本 {replica.id} 没有节点"
                    
                    if len(replica.shards) == 0:
                        return False, f"副本 {replica.id} 没有分片"
                
                return True, "所有副本正常"
            
            healthy, message = check_replica_health("documents")
            print(f"健康检查: {message}")
            ---

3.4 删除Collection

01.删除操作
    a.删除Collection
        a.功能说明
            删除操作会永久删除collection及其所有数据和索引。删除前需要先释放collection，否则会报错。删除是不可逆操作，建议在删除前进行备份。删除后collection名称可以重新使用。删除大型collection可能需要较长时间，建议在低峰期进行。删除操作会清理所有相关的元数据、索引文件和数据文件。删除过程是原子的，不会出现部分删除的情况。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            # 检查Collection是否存在
            if utility.has_collection("documents"):
                collection = Collection("documents")
                
                # 检查加载状态
                state = utility.load_state("documents")
                if state == utility.LoadState.Loaded:
                    # 释放Collection
                    collection.release()
                    print("已释放Collection")
                    time.sleep(1)
                
                # 删除Collection
                utility.drop_collection("documents")
                print("Collection已删除")
                
                # 验证删除
                assert not utility.has_collection("documents"), "删除失败"
            else:
                print("Collection不存在")
            
            # 安全删除函数
            def safe_drop_collection(collection_name):
                try:
                    if not utility.has_collection(collection_name):
                        print(f"Collection不存在: {collection_name}")
                        return True
                    
                    collection = Collection(collection_name)
                    
                    # 释放（如果已加载）
                    state = utility.load_state(collection_name)
                    if state == utility.LoadState.Loaded:
                        collection.release()
                        time.sleep(1)
                    
                    # 删除
                    utility.drop_collection(collection_name)
                    print(f"已删除: {collection_name}")
                    return True
                    
                except Exception as e:
                    print(f"删除失败: {e}")
                    return False
            
            # 使用安全删除
            safe_drop_collection("test_collection")
            
            # 删除前确认
            def drop_with_confirmation(collection_name):
                if not utility.has_collection(collection_name):
                    print("Collection不存在")
                    return
                
                collection = Collection(collection_name)
                entity_count = collection.num_entities
                
                print(f"警告: 即将删除Collection '{collection_name}'")
                print(f"包含 {entity_count:,} 条数据")
                
                # 在实际应用中，这里应该等待用户确认
                # confirm = input("确认删除? (yes/no): ")
                # if confirm.lower() == "yes":
                
                collection.release()
                utility.drop_collection(collection_name)
                print("删除完成")
            
            drop_with_confirmation("documents")
            ---
    b.批量删除
        a.功能说明
            可以批量删除多个collection，适合清理测试数据或过期数据。建议使用命名规范，便于批量识别和删除。批量删除时需要注意顺序，避免删除重要数据。可以通过前缀或后缀过滤collection名称。删除前应该进行二次确认，防止误删。批量删除适合定期清理任务，如删除临时collection、测试collection等。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            import re
            from datetime import datetime, timedelta
            
            # 列出所有Collection
            all_collections = utility.list_collections()
            print(f"所有Collection: {all_collections}")
            
            # 删除测试Collection（前缀为test_）
            for name in all_collections:
                if name.startswith("test_"):
                    collection = Collection(name)
                    collection.release()
                    utility.drop_collection(name)
                    print(f"已删除测试Collection: {name}")
            
            # 删除临时Collection（前缀为temp_）
            def drop_temp_collections():
                for name in utility.list_collections():
                    if name.startswith("temp_"):
                        safe_drop_collection(name)
            
            drop_temp_collections()
            
            # 删除过期Collection（基于命名规则）
            def drop_expired_collections(days=30):
                """删除超过指定天数的collection"""
                pattern = r"collection_(\d{8})"  # collection_20240101
                cutoff_date = datetime.now() - timedelta(days=days)
                
                dropped_count = 0
                
                for name in utility.list_collections():
                    match = re.match(pattern, name)
                    if match:
                        date_str = match.group(1)
                        try:
                            date = datetime.strptime(date_str, "%Y%m%d")
                            
                            if date < cutoff_date:
                                collection = Collection(name)
                                collection.release()
                                utility.drop_collection(name)
                                print(f"删除过期Collection: {name} (日期: {date_str})")
                                dropped_count += 1
                        except ValueError:
                            print(f"日期格式错误: {name}")
                
                print(f"共删除 {dropped_count} 个过期Collection")
            
            drop_expired_collections(days=30)
            
            # 按模式批量删除
            def drop_by_pattern(pattern, dry_run=True):
                """按正则表达式模式删除collection"""
                regex = re.compile(pattern)
                to_drop = []
                
                for name in utility.list_collections():
                    if regex.match(name):
                        to_drop.append(name)
                
                print(f"匹配到 {len(to_drop)} 个Collection:")
                for name in to_drop:
                    collection = Collection(name)
                    print(f"  {name} ({collection.num_entities:,} 条数据)")
                
                if dry_run:
                    print("(预览模式，未实际删除)")
                    return
                
                # 实际删除
                for name in to_drop:
                    safe_drop_collection(name)
            
            # 预览要删除的collection
            drop_by_pattern(r"^backup_\d+$", dry_run=True)
            
            # 实际删除
            # drop_by_pattern(r"^backup_\d+$", dry_run=False)
            ---

02.数据清理
    a.清空数据
        a.功能说明
            如果只想清空数据但保留collection结构，可以删除所有实体。这种方式保留了Schema和索引定义，可以继续插入新数据。相比删除重建collection，清空数据更快且不需要重新创建索引。适合需要定期清空数据的场景，如临时缓存或测试环境。清空后需要执行compaction释放存储空间。清空大量数据建议分批进行，避免单次操作超时。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            
            collection = Collection("documents")
            
            # 方法1: 删除所有数据（简单但可能超时）
            expr = "id >= 0"  # 匹配所有记录
            collection.delete(expr)
            
            # 刷新删除操作
            collection.flush()
            
            # 触发compaction释放空间
            collection.compact()
            
            print(f"清空后实体数量: {collection.num_entities}")
            
            # 方法2: 分批清空大量数据
            def clear_collection_data(collection, batch_size=10000):
                """分批删除所有数据"""
                total_deleted = 0
                
                while True:
                    # 查询一批ID
                    results = collection.query(
                        expr="id >= 0",
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 删除这批数据
                    ids = [r["id"] for r in results]
                    expr = f"id in {ids}"
                    collection.delete(expr)
                    
                    total_deleted += len(ids)
                    print(f"已删除 {len(ids)} 条数据，累计: {total_deleted}")
                    
                    # 避免过快删除
                    time.sleep(0.1)
                
                # 刷新和压缩
                collection.flush()
                print("正在压缩...")
                collection.compact()
                
                # 等待压缩完成
                from pymilvus import utility
                while True:
                    state = utility.get_compaction_state(collection.name)
                    if state.state == 3:  # 完成
                        break
                    time.sleep(1)
                
                print(f"清空完成，共删除 {total_deleted} 条数据")
                print(f"当前实体数量: {collection.num_entities}")
            
            clear_collection_data(collection, batch_size=10000)
            
            # 方法3: 按条件清空
            def clear_by_condition(collection, expr):
                """按条件删除数据"""
                # 先查询要删除的数量
                results = collection.query(
                    expr=expr,
                    output_fields=["id"],
                    limit=16384  # 最大限制
                )
                
                print(f"匹配到 {len(results)} 条数据")
                
                if len(results) == 0:
                    return
                
                # 删除
                collection.delete(expr)
                collection.flush()
                
                print(f"已删除 {len(results)} 条数据")
            
            # 删除旧数据
            clear_by_condition(collection, "timestamp < 1640000000")
            
            # 删除特定类别
            clear_by_condition(collection, 'category == "test"')
            ---
    b.备份恢复
        a.功能说明
            删除前应该进行数据备份，以防误删或需要恢复。可以导出数据到文件，或复制到新collection。Milvus支持快照功能，可以创建collection的时间点快照。备份策略应该包括定期备份和删除前备份。恢复时需要重新创建collection并导入数据。备份文件应该包含Schema定义和所有数据。建议使用压缩格式减少存储空间。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, utility
            import json
            import gzip
            import pickle
            from datetime import datetime
            
            # 备份Collection数据
            def backup_collection(collection_name, backup_dir="./backups"):
                import os
                os.makedirs(backup_dir, exist_ok=True)
                
                collection = Collection(collection_name)
                collection.load()
                
                # 备份Schema
                schema_dict = {
                    "fields": [
                        {
                            "name": f.name,
                            "dtype": str(f.dtype),
                            "is_primary": f.is_primary,
                            "auto_id": f.auto_id,
                            "params": f.params
                        }
                        for f in collection.schema.fields
                    ],
                    "description": collection.schema.description
                }
                
                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                schema_file = f"{backup_dir}/{collection_name}_schema_{timestamp}.json"
                
                with open(schema_file, 'w') as f:
                    json.dump(schema_dict, f, indent=2)
                
                print(f"Schema已备份: {schema_file}")
                
                # 备份数据（分批）
                batch_size = 10000
                offset = 0
                batch_num = 0
                
                while True:
                    results = collection.query(
                        expr="id >= 0",
                        output_fields=["*"],
                        limit=batch_size,
                        offset=offset
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 保存批次数据（使用gzip压缩）
                    data_file = f"{backup_dir}/{collection_name}_data_{timestamp}_batch{batch_num:04d}.pkl.gz"
                    
                    with gzip.open(data_file, 'wb') as f:
                        pickle.dump(results, f)
                    
                    print(f"批次 {batch_num} 已备份: {len(results)} 条数据")
                    
                    offset += batch_size
                    batch_num += 1
                
                print(f"备份完成: {offset} 条数据，{batch_num} 个批次")
                
                return schema_file, batch_num
            
            # 恢复Collection数据
            def restore_collection(collection_name, schema_file, backup_dir, batch_count):
                import os
                
                # 读取Schema
                with open(schema_file, 'r') as f:
                    schema_dict = json.load(f)
                
                # 重建Schema
                from pymilvus import FieldSchema, DataType
                
                fields = []
                for f in schema_dict["fields"]:
                    dtype = getattr(DataType, f["dtype"].split(".")[-1])
                    field = FieldSchema(
                        name=f["name"],
                        dtype=dtype,
                        is_primary=f.get("is_primary", False),
                        auto_id=f.get("auto_id", False),
                        **f.get("params", {})
                    )
                    fields.append(field)
                
                schema = CollectionSchema(
                    fields=fields,
                    description=schema_dict.get("description", "")
                )
                
                # 删除旧collection（如果存在）
                if utility.has_collection(collection_name):
                    safe_drop_collection(collection_name)
                
                # 创建新collection
                collection = Collection(collection_name, schema=schema)
                print(f"Collection已创建: {collection_name}")
                
                # 恢复数据
                total_restored = 0
                timestamp = os.path.basename(schema_file).split("_")[-1].replace(".json", "")
                
                for batch_num in range(batch_count):
                    data_file = f"{backup_dir}/{collection_name}_data_{timestamp}_batch{batch_num:04d}.pkl.gz"
                    
                    if not os.path.exists(data_file):
                        print(f"批次文件不存在: {data_file}")
                        continue
                    
                    # 读取批次数据
                    with gzip.open(data_file, 'rb') as f:
                        batch_data = pickle.load(f)
                    
                    # 转换数据格式
                    field_data = {}
                    for field in schema.fields:
                        field_data[field.name] = [item[field.name] for item in batch_data]
                    
                    # 插入数据
                    data_list = [field_data[f.name] for f in schema.fields if not f.auto_id]
                    collection.insert(data_list)
                    
                    total_restored += len(batch_data)
                    print(f"批次 {batch_num} 已恢复: {len(batch_data)} 条数据")
                
                # 刷新
                collection.flush()
                print(f"恢复完成: {total_restored} 条数据")
                print(f"当前实体数量: {collection.num_entities}")
            
            # 使用备份和恢复
            # 备份
            schema_file, batch_count = backup_collection("documents", "./backups")
            
            # 恢复
            # restore_collection("documents_restored", schema_file, "./backups", batch_count)
            
            # 定期备份任务
            def scheduled_backup(collection_name, backup_dir, interval_hours=24):
                import time
                
                while True:
                    try:
                        print(f"开始备份: {datetime.now()}")
                        backup_collection(collection_name, backup_dir)
                        print("备份完成")
                    except Exception as e:
                        print(f"备份失败: {e}")
                    
                    time.sleep(interval_hours * 3600)
            
            # 启动定期备份（后台线程）
            import threading
            backup_thread = threading.Thread(
                target=scheduled_backup,
                args=("documents", "./backups", 24),
                daemon=True
            )
            backup_thread.start()
            ---

4 数据操作

4.1 插入数据

01.插入方式
    a.列式插入
        a.功能说明
            Milvus使用列式存储格式，插入数据时需要按列组织。每个字段对应一个列表，所有列表长度必须相同。列式插入是Milvus的标准插入方式，性能最优。插入操作是原子的，要么全部成功要么全部失败。返回值包含插入的主键列表和插入数量。插入后数据不会立即可见，需要等待刷新或自动刷新周期。建议批量插入，单次插入1000-10000条数据性能最佳。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 准备数据（列式格式）
            ids = [i for i in range(1000)]
            titles = [f"文档{i}" for i in range(1000)]
            categories = ["技术", "新闻", "博客"] * 334  # 循环填充
            timestamps = [1700000000 + i for i in range(1000)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(1000)]
            
            # 插入数据（按Schema字段顺序）
            data = [ids, titles, categories, timestamps, embeddings]
            insert_result = collection.insert(data)
            
            print(f"插入成功: {insert_result.insert_count} 条")
            print(f"主键列表: {insert_result.primary_keys[:10]}...")
            
            # 刷新数据（使数据立即可见）
            collection.flush()
            print(f"刷新后实体数量: {collection.num_entities}")
            
            # 验证插入
            results = collection.query(
                expr="id in [0, 1, 2]",
                output_fields=["id", "title", "category"]
            )
            for r in results:
                print(f"ID: {r['id']}, Title: {r['title']}, Category: {r['category']}")
            ---
    b.字典式插入
        a.功能说明
            除了列式插入，Milvus也支持字典列表的插入方式。每条记录是一个字典，字段名作为key。这种方式更直观，但性能略低于列式插入。适合数据来源是JSON或字典格式的场景。字典中必须包含所有非自动生成的字段。字段顺序不重要，Milvus会自动匹配。对于动态Schema，字典式插入更灵活。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 准备数据（字典列表格式）
            data = [
                {
                    "id": 2000 + i,
                    "title": f"文档{2000 + i}",
                    "category": "技术",
                    "timestamp": 1700000000 + i,
                    "embedding": [np.random.random() for _ in range(128)]
                }
                for i in range(100)
            ]
            
            # 插入数据
            insert_result = collection.insert(data)
            print(f"插入成功: {insert_result.insert_count} 条")
            
            # 混合字段顺序
            data_mixed = [
                {
                    "embedding": [0.1] * 128,
                    "id": 3000,
                    "timestamp": 1700000000,
                    "category": "新闻",
                    "title": "文档3000"
                },
                {
                    "title": "文档3001",
                    "id": 3001,
                    "embedding": [0.2] * 128,
                    "category": "博客",
                    "timestamp": 1700000001
                }
            ]
            
            collection.insert(data_mixed)
            collection.flush()
            
            # 动态Schema示例
            collection_dynamic = Collection("dynamic_collection")
            
            data_dynamic = [
                {
                    "id": 1,
                    "embedding": [0.1] * 128,
                    "extra_field1": "额外数据",  # 动态字段
                    "extra_field2": 123
                }
            ]
            
            collection_dynamic.insert(data_dynamic)
            ---

02.数据类型处理
    a.向量数据
        a.功能说明
            向量数据是Milvus的核心数据类型，必须与Schema定义的维度一致。支持Python list、NumPy array等格式。浮点向量使用float32类型，维度可以是任意正整数。二值向量使用bytes类型，维度必须是8的倍数。向量数据会自动归一化（如果索引要求）。插入前建议验证向量维度，避免运行时错误。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # Python list格式
            embedding_list = [[0.1, 0.2, 0.3] * 43 for _ in range(10)]  # 129维截断到128
            embedding_list = [[0.1] * 128 for _ in range(10)]  # 正确的128维
            
            # NumPy array格式
            embedding_np = np.random.rand(10, 128).astype(np.float32)
            
            # 转换为list（Milvus接受）
            embedding_from_np = embedding_np.tolist()
            
            # 插入向量数据
            ids = list(range(4000, 4010))
            titles = [f"文档{i}" for i in range(4000, 4010)]
            categories = ["技术"] * 10
            timestamps = [1700000000] * 10
            
            data = [ids, titles, categories, timestamps, embedding_from_np]
            collection.insert(data)
            
            # 二值向量示例
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            fields_binary = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="binary_vector", dtype=DataType.BINARY_VECTOR, dim=512)
            ]
            schema_binary = CollectionSchema(fields=fields_binary)
            collection_binary = Collection("binary_collection", schema=schema_binary)
            
            # 生成二值向量（512维 = 64字节）
            binary_vectors = [bytes(np.random.randint(0, 256, 64)) for _ in range(10)]
            ids_binary = list(range(10))
            
            data_binary = [ids_binary, binary_vectors]
            collection_binary.insert(data_binary)
            
            # 向量维度验证
            def validate_vectors(vectors, expected_dim):
                for i, vec in enumerate(vectors):
                    if len(vec) != expected_dim:
                        raise ValueError(f"向量 {i} 维度错误: {len(vec)}, 期望: {expected_dim}")
                return True
            
            validate_vectors(embedding_from_np, 128)
            print("向量维度验证通过")
            ---
    b.标量数据
        a.功能说明
            标量数据包括整数、浮点数、字符串、布尔值等类型。VARCHAR类型必须符合最大长度限制，超长会被截断或报错。JSON类型支持嵌套结构，可以存储复杂对象。整数类型有范围限制，超出范围会报错。时间戳建议使用INT64存储Unix时间戳。NULL值不支持，所有字段都必须有值。
        b.代码示例
            ---
            from pymilvus import Collection
            import json
            import time
            
            collection = Collection("documents")
            
            # 整数类型
            ids = [5000, 5001, 5002]
            ages = [25, 30, 35]  # INT32
            
            # 浮点类型
            scores = [95.5, 87.3, 92.1]  # FLOAT
            ratings = [4.5, 3.8, 4.9]  # DOUBLE
            
            # 字符串类型（注意长度限制）
            titles = ["标题" * 50][:200]  # 截断到200字符
            long_title = "很长的标题" * 100
            if len(long_title) > 200:
                long_title = long_title[:200]
            
            titles = [
                "短标题",
                long_title,
                "中等长度的标题"
            ]
            
            # 布尔类型
            is_active = [True, False, True]
            
            # JSON类型
            metadata = [
                {"author": "张三", "tags": ["AI", "ML"], "views": 1000},
                {"author": "李四", "tags": ["DL"], "views": 500},
                {"author": "王五", "tags": ["NLP", "CV"], "views": 800}
            ]
            
            # 时间戳
            timestamps = [
                int(time.time()),
                int(time.time()) - 86400,  # 1天前
                int(time.time()) - 172800  # 2天前
            ]
            
            # 向量
            embeddings = [[0.1] * 128 for _ in range(3)]
            
            # 插入混合类型数据
            data = [ids, titles, timestamps, embeddings]
            collection.insert(data)
            
            # 类型转换
            def convert_data_types(data_dict):
                """确保数据类型正确"""
                converted = {}
                
                # 整数转换
                if "id" in data_dict:
                    converted["id"] = int(data_dict["id"])
                
                # 字符串长度限制
                if "title" in data_dict:
                    title = str(data_dict["title"])
                    converted["title"] = title[:200]  # 截断
                
                # 时间戳转换
                if "timestamp" in data_dict:
                    ts = data_dict["timestamp"]
                    if isinstance(ts, str):
                        from datetime import datetime
                        dt = datetime.fromisoformat(ts)
                        converted["timestamp"] = int(dt.timestamp())
                    else:
                        converted["timestamp"] = int(ts)
                
                # JSON序列化
                if "metadata" in data_dict:
                    if isinstance(data_dict["metadata"], dict):
                        converted["metadata"] = data_dict["metadata"]
                    else:
                        converted["metadata"] = json.loads(data_dict["metadata"])
                
                return converted
            
            # 使用转换函数
            raw_data = {
                "id": "6000",
                "title": "x" * 300,
                "timestamp": "2024-01-01T00:00:00",
                "metadata": '{"key": "value"}'
            }
            
            converted = convert_data_types(raw_data)
            print(f"转换后: {converted}")
            ---

03.批量插入优化
    a.批次大小
        a.功能说明
            批次大小直接影响插入性能和内存占用。单次插入建议1000-10000条数据，过小会增加网络开销，过大可能导致超时或内存不足。需要根据数据大小和网络条件调整批次大小。向量维度越高，批次应该越小。建议通过性能测试确定最优批次大小。Milvus对单次插入有大小限制（通常几百MB），超过会报错。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 测试不同批次大小
            def test_batch_size(collection, total_count, batch_size):
                start_time = time.time()
                
                for i in range(0, total_count, batch_size):
                    batch_end = min(i + batch_size, total_count)
                    batch_count = batch_end - i
                    
                    # 生成批次数据
                    ids = list(range(i, batch_end))
                    titles = [f"文档{j}" for j in range(i, batch_end)]
                    categories = ["技术"] * batch_count
                    timestamps = [1700000000] * batch_count
                    embeddings = [[np.random.random() for _ in range(128)] for _ in range(batch_count)]
                    
                    # 插入
                    data = [ids, titles, categories, timestamps, embeddings]
                    collection.insert(data)
                
                # 刷新
                collection.flush()
                
                elapsed = time.time() - start_time
                qps = total_count / elapsed
                
                return elapsed, qps
            
            # 测试不同批次大小
            total_count = 10000
            
            for batch_size in [100, 500, 1000, 5000, 10000]:
                elapsed, qps = test_batch_size(collection, total_count, batch_size)
                print(f"批次大小: {batch_size:5d}, 耗时: {elapsed:.2f}s, QPS: {qps:.2f}")
            
            # 自适应批次大小
            def adaptive_batch_insert(collection, data_generator, vector_dim=128):
                # 估算单条数据大小（字节）
                single_size = vector_dim * 4 + 1000  # 向量 + 元数据
                
                # 目标批次大小：10MB
                target_size = 10 * 1024 * 1024
                batch_size = max(100, min(10000, target_size // single_size))
                
                print(f"自适应批次大小: {batch_size}")
                
                batch = []
                for item in data_generator:
                    batch.append(item)
                    
                    if len(batch) >= batch_size:
                        collection.insert(batch)
                        batch = []
                
                # 插入剩余数据
                if batch:
                    collection.insert(batch)
            
            # 使用自适应批次
            def data_gen():
                for i in range(10000):
                    yield {
                        "id": 10000 + i,
                        "title": f"文档{i}",
                        "category": "技术",
                        "timestamp": 1700000000,
                        "embedding": [0.1] * 128
                    }
            
            adaptive_batch_insert(collection, data_gen())
            ---
    b.并发插入
        a.功能说明
            Milvus支持并发插入，可以显著提高吞吐量。多个客户端或线程可以同时插入数据。需要注意主键冲突，确保不同线程插入不同的ID范围。并发插入会增加服务器负载，需要根据服务器性能调整并发度。建议使用连接池管理连接。过高的并发可能导致性能下降或超时。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import concurrent.futures
            import time
            
            collection = Collection("documents")
            
            # 单线程插入函数
            def insert_batch(start_id, count):
                ids = list(range(start_id, start_id + count))
                titles = [f"文档{i}" for i in ids]
                categories = ["技术"] * count
                timestamps = [1700000000] * count
                embeddings = [[np.random.random() for _ in range(128)] for _ in range(count)]
                
                data = [ids, titles, categories, timestamps, embeddings]
                result = collection.insert(data)
                
                return result.insert_count
            
            # 并发插入测试
            def concurrent_insert_test(total_count, num_workers, batch_size):
                start_time = time.time()
                
                # 计算每个worker的ID范围
                tasks = []
                for i in range(num_workers):
                    start_id = 20000 + i * (total_count // num_workers)
                    count = total_count // num_workers
                    tasks.append((start_id, count))
                
                # 并发执行
                with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
                    futures = [executor.submit(insert_batch, start_id, count) for start_id, count in tasks]
                    results = [f.result() for f in futures]
                
                # 刷新
                collection.flush()
                
                elapsed = time.time() - start_time
                total_inserted = sum(results)
                qps = total_inserted / elapsed
                
                return elapsed, qps
            
            # 测试不同并发度
            total_count = 10000
            
            for num_workers in [1, 2, 4, 8]:
                elapsed, qps = concurrent_insert_test(total_count, num_workers, 1000)
                print(f"并发度: {num_workers}, 耗时: {elapsed:.2f}s, QPS: {qps:.2f}")
            
            # 生产者-消费者模式
            import queue
            import threading
            
            def producer(data_queue, total_count):
                """生产数据"""
                for i in range(total_count):
                    item = {
                        "id": 30000 + i,
                        "title": f"文档{i}",
                        "category": "技术",
                        "timestamp": 1700000000,
                        "embedding": [np.random.random() for _ in range(128)]
                    }
                    data_queue.put(item)
                
                # 发送结束信号
                for _ in range(4):  # 4个消费者
                    data_queue.put(None)
            
            def consumer(data_queue, collection, batch_size=1000):
                """消费并插入数据"""
                batch = []
                
                while True:
                    item = data_queue.get()
                    
                    if item is None:  # 结束信号
                        break
                    
                    batch.append(item)
                    
                    if len(batch) >= batch_size:
                        collection.insert(batch)
                        batch = []
                
                # 插入剩余数据
                if batch:
                    collection.insert(batch)
            
            # 启动生产者-消费者
            data_queue = queue.Queue(maxsize=1000)
            
            # 启动生产者
            producer_thread = threading.Thread(target=producer, args=(data_queue, 10000))
            producer_thread.start()
            
            # 启动消费者
            consumer_threads = []
            for _ in range(4):
                t = threading.Thread(target=consumer, args=(data_queue, collection, 1000))
                t.start()
                consumer_threads.append(t)
            
            # 等待完成
            producer_thread.join()
            for t in consumer_threads:
                t.join()
            
            collection.flush()
            print("并发插入完成")
            ---

4.2 删除数据

01.删除方式
    a.按表达式删除
        a.功能说明
            通过表达式删除满足条件的实体是Milvus的主要删除方式。支持按主键、标量字段或组合条件删除。删除操作是异步的，立即返回但数据可能不会立即删除。删除后的数据在查询中不可见，但存储空间不会立即释放。需要执行compaction操作才能真正释放空间。表达式语法与查询表达式相同，支持复杂的逻辑组合。单次删除建议不超过16384条记录。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 删除单条记录（按主键）
            expr = "id == 1001"
            collection.delete(expr)
            print("已删除ID为1001的记录")
            
            # 批量删除（按主键列表）
            ids_to_delete = [1, 2, 3, 4, 5]
            expr = f"id in {ids_to_delete}"
            collection.delete(expr)
            print(f"已删除{len(ids_to_delete)}条记录")
            
            # 范围删除
            expr = "id > 2000 and id < 2100"
            collection.delete(expr)
            print("已删除ID在2000-2100之间的记录")
            
            # 按标量字段删除
            expr = 'category == "test"'
            collection.delete(expr)
            print("已删除测试类别的记录")
            
            # 复杂条件删除
            expr = '(category == "test" or category == "temp") and timestamp < 1700000000'
            collection.delete(expr)
            print("已删除符合条件的记录")
            
            # 刷新删除操作
            collection.flush()
            print(f"当前实体数量: {collection.num_entities}")
            
            # 安全删除函数
            def safe_delete(collection, expr, dry_run=False):
                """安全删除，支持预览模式"""
                # 先查询要删除的数据
                try:
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=16384
                    )
                    
                    count = len(results)
                    print(f"匹配到 {count} 条记录")
                    
                    if count == 0:
                        print("没有匹配的记录")
                        return 0
                    
                    if dry_run:
                        print("(预览模式，未实际删除)")
                        return count
                    
                    # 实际删除
                    collection.delete(expr)
                    collection.flush()
                    print(f"已删除 {count} 条记录")
                    return count
                    
                except Exception as e:
                    print(f"删除失败: {e}")
                    return 0
            
            # 使用安全删除
            safe_delete(collection, "id > 5000", dry_run=True)  # 预览
            safe_delete(collection, "id > 5000", dry_run=False)  # 实际删除
            ---
    b.分批删除
        a.功能说明
            删除大量数据时建议分批进行，避免单次删除过多影响性能。分批删除可以控制每次删除的数量，减少对系统的冲击。适合删除百万级以上的数据。每批删除后可以暂停一段时间，让系统有时间处理。分批删除需要合理设计批次大小和间隔时间。可以通过查询+删除的方式实现精确的分批控制。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            
            collection = Collection("documents")
            
            # 分批删除大量数据
            def batch_delete(collection, expr, batch_size=1000, sleep_interval=0.1):
                """分批删除数据"""
                total_deleted = 0
                
                while True:
                    # 查询一批要删除的ID
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 删除这批数据
                    ids = [r["id"] for r in results]
                    delete_expr = f"id in {ids}"
                    collection.delete(delete_expr)
                    
                    total_deleted += len(ids)
                    print(f"已删除 {len(ids)} 条数据，累计: {total_deleted}")
                    
                    # 暂停
                    if sleep_interval > 0:
                        time.sleep(sleep_interval)
                
                # 刷新
                collection.flush()
                print(f"分批删除完成，共删除 {total_deleted} 条数据")
                return total_deleted
            
            # 删除旧数据
            batch_delete(collection, "timestamp < 1600000000", batch_size=1000)
            
            # 按ID范围分批删除
            def delete_by_id_range(collection, start_id, end_id, batch_size=1000):
                """按ID范围分批删除"""
                total_deleted = 0
                
                for i in range(start_id, end_id, batch_size):
                    batch_end = min(i + batch_size, end_id)
                    expr = f"id >= {i} and id < {batch_end}"
                    
                    collection.delete(expr)
                    total_deleted += (batch_end - i)
                    
                    print(f"已删除 ID {i} 到 {batch_end}，累计: {total_deleted}")
                    time.sleep(0.1)
                
                collection.flush()
                print(f"范围删除完成，共删除 {total_deleted} 条数据")
                return total_deleted
            
            delete_by_id_range(collection, 10000, 20000, batch_size=1000)
            
            # 带进度监控的分批删除
            def batch_delete_with_progress(collection, expr, batch_size=1000):
                """带进度监控的分批删除"""
                # 先统计总数
                total_results = collection.query(
                    expr=expr,
                    output_fields=["id"],
                    limit=16384
                )
                total_count = len(total_results)
                
                if total_count == 0:
                    print("没有匹配的记录")
                    return 0
                
                print(f"总共需要删除 {total_count} 条数据")
                
                deleted = 0
                start_time = time.time()
                
                while deleted < total_count:
                    # 查询一批
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 删除
                    ids = [r["id"] for r in results]
                    collection.delete(f"id in {ids}")
                    
                    deleted += len(ids)
                    progress = (deleted / total_count) * 100
                    elapsed = time.time() - start_time
                    
                    print(f"进度: {progress:.1f}% ({deleted}/{total_count}), 耗时: {elapsed:.1f}s")
                    
                    time.sleep(0.1)
                
                collection.flush()
                total_time = time.time() - start_time
                print(f"删除完成，总耗时: {total_time:.1f}s")
                return deleted
            
            batch_delete_with_progress(collection, 'category == "temp"', batch_size=1000)
            ---

02.删除策略
    a.软删除标记
        a.功能说明
            软删除是通过标记字段而不是真正删除数据的方式。可以保留数据历史，支持恢复操作。适合需要审计或回滚的场景。软删除的数据仍然占用存储空间，需要定期清理。查询时需要过滤已删除的数据。可以通过定时任务将软删除的数据真正删除。软删除提供了更大的灵活性，但会增加存储和查询开销。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import time
            
            # 创建带软删除标记的Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="is_deleted", dtype=DataType.BOOL),  # 软删除标记
                FieldSchema(name="deleted_at", dtype=DataType.INT64),  # 删除时间
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(fields=fields, description="支持软删除")
            collection = Collection("soft_delete_collection", schema=schema)
            
            # 插入数据（初始未删除）
            data = [
                [1, 2, 3],  # id
                ["文档1", "文档2", "文档3"],  # title
                [False, False, False],  # is_deleted
                [0, 0, 0],  # deleted_at
                [[0.1]*128, [0.2]*128, [0.3]*128]  # embedding
            ]
            collection.insert(data)
            collection.flush()
            
            # 软删除函数
            def soft_delete(collection, ids):
                """软删除指定ID的记录"""
                if not ids:
                    return
                
                # 查询现有数据
                results = collection.query(
                    expr=f"id in {ids}",
                    output_fields=["*"]
                )
                
                if not results:
                    print("没有找到要删除的记录")
                    return
                
                # 先删除旧记录
                collection.delete(f"id in {ids}")
                
                # 重新插入，标记为已删除
                deleted_time = int(time.time())
                
                ids_list = [r["id"] for r in results]
                titles = [r["title"] for r in results]
                is_deleted = [True] * len(results)
                deleted_at = [deleted_time] * len(results)
                embeddings = [r["embedding"] for r in results]
                
                data = [ids_list, titles, is_deleted, deleted_at, embeddings]
                collection.insert(data)
                collection.flush()
                
                print(f"软删除 {len(ids)} 条记录")
            
            # 使用软删除
            soft_delete(collection, [1, 2])
            
            # 查询未删除的数据
            results = collection.query(
                expr="is_deleted == false",
                output_fields=["id", "title"]
            )
            print(f"未删除的记录: {results}")
            
            # 恢复软删除的数据
            def undelete(collection, ids):
                """恢复软删除的记录"""
                results = collection.query(
                    expr=f"id in {ids} and is_deleted == true",
                    output_fields=["*"]
                )
                
                if not results:
                    print("没有找到要恢复的记录")
                    return
                
                # 删除旧记录
                collection.delete(f"id in {ids}")
                
                # 重新插入，标记为未删除
                ids_list = [r["id"] for r in results]
                titles = [r["title"] for r in results]
                is_deleted = [False] * len(results)
                deleted_at = [0] * len(results)
                embeddings = [r["embedding"] for r in results]
                
                data = [ids_list, titles, is_deleted, deleted_at, embeddings]
                collection.insert(data)
                collection.flush()
                
                print(f"恢复 {len(ids)} 条记录")
            
            undelete(collection, [1])
            
            # 定期清理软删除的数据
            def cleanup_soft_deleted(collection, days=30):
                """清理超过指定天数的软删除数据"""
                cutoff_time = int(time.time()) - (days * 86400)
                
                # 查询要清理的数据
                results = collection.query(
                    expr=f"is_deleted == true and deleted_at < {cutoff_time}",
                    output_fields=["id"],
                    limit=16384
                )
                
                if not results:
                    print("没有需要清理的数据")
                    return
                
                # 真正删除
                ids = [r["id"] for r in results]
                collection.delete(f"id in {ids}")
                collection.flush()
                
                print(f"清理 {len(ids)} 条软删除数据")
            
            cleanup_soft_deleted(collection, days=30)
            ---
    b.定时清理
        a.功能说明
            定时清理是自动删除过期数据的机制。可以基于时间戳、访问频率等条件清理数据。适合日志、缓存等时效性数据。定时清理可以通过定时任务或后台线程实现。清理策略应该考虑业务需求和存储成本。建议在低峰期执行清理任务，减少对业务的影响。清理后需要执行compaction释放空间。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            import threading
            from datetime import datetime, timedelta
            
            collection = Collection("documents")
            
            # 基于时间戳的清理
            def cleanup_by_timestamp(collection, days=30):
                """删除超过指定天数的数据"""
                cutoff_time = int(time.time()) - (days * 86400)
                
                expr = f"timestamp < {cutoff_time}"
                
                # 分批删除
                total_deleted = 0
                batch_size = 1000
                
                while True:
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    ids = [r["id"] for r in results]
                    collection.delete(f"id in {ids}")
                    total_deleted += len(ids)
                    
                    print(f"已清理 {len(ids)} 条数据，累计: {total_deleted}")
                    time.sleep(0.1)
                
                collection.flush()
                collection.compact()
                
                print(f"清理完成，共删除 {total_deleted} 条数据")
                return total_deleted
            
            cleanup_by_timestamp(collection, days=30)
            
            # 定时清理任务
            def scheduled_cleanup(collection, interval_hours=24, retention_days=30):
                """定时清理任务"""
                while True:
                    try:
                        print(f"开始清理: {datetime.now()}")
                        deleted = cleanup_by_timestamp(collection, days=retention_days)
                        print(f"清理完成: 删除 {deleted} 条数据")
                    except Exception as e:
                        print(f"清理失败: {e}")
                    
                    # 等待下次清理
                    time.sleep(interval_hours * 3600)
            
            # 启动定时清理（后台线程）
            cleanup_thread = threading.Thread(
                target=scheduled_cleanup,
                args=(collection, 24, 30),
                daemon=True
            )
            cleanup_thread.start()
            
            # 按类别清理
            def cleanup_by_category(collection, categories_to_delete):
                """删除指定类别的数据"""
                for category in categories_to_delete:
                    expr = f'category == "{category}"'
                    
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=16384
                    )
                    
                    if results:
                        ids = [r["id"] for r in results]
                        collection.delete(f"id in {ids}")
                        print(f"已删除类别 '{category}': {len(ids)} 条数据")
                
                collection.flush()
                collection.compact()
            
            cleanup_by_category(collection, ["test", "temp", "draft"])
            
            # 智能清理策略
            class CleanupManager:
                def __init__(self, collection, max_entities=1000000):
                    self.collection = collection
                    self.max_entities = max_entities
                
                def check_and_cleanup(self):
                    """检查并清理数据"""
                    current_count = self.collection.num_entities
                    
                    if current_count <= self.max_entities:
                        print(f"当前数量 {current_count}，无需清理")
                        return
                    
                    # 需要删除的数量
                    to_delete = current_count - self.max_entities
                    print(f"当前数量 {current_count}，需要删除 {to_delete} 条")
                    
                    # 删除最旧的数据
                    results = self.collection.query(
                        expr="id >= 0",
                        output_fields=["id", "timestamp"],
                        limit=to_delete + 1000  # 多查一些
                    )
                    
                    # 按时间戳排序
                    results_sorted = sorted(results, key=lambda x: x["timestamp"])
                    
                    # 删除最旧的
                    ids_to_delete = [r["id"] for r in results_sorted[:to_delete]]
                    
                    # 分批删除
                    batch_size = 1000
                    for i in range(0, len(ids_to_delete), batch_size):
                        batch = ids_to_delete[i:i+batch_size]
                        self.collection.delete(f"id in {batch}")
                        print(f"已删除 {len(batch)} 条旧数据")
                    
                    self.collection.flush()
                    self.collection.compact()
                    
                    print(f"清理完成，当前数量: {self.collection.num_entities}")
            
            # 使用智能清理
            manager = CleanupManager(collection, max_entities=1000000)
            manager.check_and_cleanup()
            ---

4.3 更新数据

01.更新机制
    a.Upsert操作
        a.功能说明
            Milvus使用Upsert（Update+Insert）机制更新数据。如果主键存在则更新，不存在则插入。Upsert是原子操作，保证数据一致性。更新操作会替换整条记录，不支持部分字段更新。需要提供完整的字段数据，包括向量。Upsert性能略低于纯插入，因为需要检查主键是否存在。适合需要保持数据最新的场景，如实时更新的文档库。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # Upsert单条数据
            data = [
                [1],  # id (已存在则更新，不存在则插入)
                ["更新后的标题"],  # title
                ["技术"],  # category
                [1700000000],  # timestamp
                [[0.9] * 128]  # embedding (新向量)
            ]
            
            collection.upsert(data)
            collection.flush()
            print("Upsert完成")
            
            # 验证更新
            results = collection.query(
                expr="id == 1",
                output_fields=["id", "title"]
            )
            print(f"更新后: {results}")
            
            # 批量Upsert
            ids = [10, 11, 12, 13, 14]  # 部分存在，部分不存在
            titles = [f"更新文档{i}" for i in ids]
            categories = ["技术"] * len(ids)
            timestamps = [1700000000] * len(ids)
            embeddings = [[np.random.random() for _ in range(128)] for _ in ids]
            
            data = [ids, titles, categories, timestamps, embeddings]
            result = collection.upsert(data)
            
            print(f"Upsert数量: {result.upsert_count}")
            collection.flush()
            
            # Upsert字典格式
            data_dict = [
                {
                    "id": 20,
                    "title": "字典格式更新",
                    "category": "新闻",
                    "timestamp": 1700000000,
                    "embedding": [0.5] * 128
                },
                {
                    "id": 21,
                    "title": "字典格式插入",
                    "category": "博客",
                    "timestamp": 1700000001,
                    "embedding": [0.6] * 128
                }
            ]
            
            collection.upsert(data_dict)
            collection.flush()
            print("字典格式Upsert完成")
            ---
    b.更新策略
        a.功能说明
            由于Milvus不支持部分字段更新，需要先查询完整数据，修改后再Upsert。这种方式会有性能开销，不适合高频更新场景。可以在应用层缓存数据，减少查询次数。对于只需要更新向量的场景，可以只保存必要的元数据。建议批量更新，提高效率。更新操作会产生新的segment，需要定期compaction。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 更新单个字段
            def update_field(collection, id, field_name, new_value):
                """更新单个字段"""
                # 查询现有数据
                results = collection.query(
                    expr=f"id == {id}",
                    output_fields=["*"]
                )
                
                if not results:
                    print(f"ID {id} 不存在")
                    return False
                
                # 修改字段
                record = results[0]
                record[field_name] = new_value
                
                # Upsert
                data = [[record[f.name] for f in collection.schema.fields if not f.auto_id]]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 ID {id} 的 {field_name}")
                return True
            
            update_field(collection, 1, "title", "新标题")
            
            # 批量更新字段
            def batch_update_field(collection, ids, field_name, new_values):
                """批量更新字段"""
                if len(ids) != len(new_values):
                    raise ValueError("ID和值的数量不匹配")
                
                # 查询现有数据
                results = collection.query(
                    expr=f"id in {ids}",
                    output_fields=["*"]
                )
                
                # 创建ID到记录的映射
                records_map = {r["id"]: r for r in results}
                
                # 准备更新数据
                updated_records = []
                for id, new_value in zip(ids, new_values):
                    if id in records_map:
                        record = records_map[id]
                        record[field_name] = new_value
                        updated_records.append(record)
                
                if not updated_records:
                    print("没有找到要更新的记录")
                    return
                
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in updated_records]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 {len(updated_records)} 条记录的 {field_name}")
            
            batch_update_field(collection, [1, 2, 3], "category", ["AI", "ML", "DL"])
            
            # 更新向量
            def update_embedding(collection, id, new_embedding):
                """更新向量"""
                results = collection.query(
                    expr=f"id == {id}",
                    output_fields=["*"]
                )
                
                if not results:
                    print(f"ID {id} 不存在")
                    return False
                
                record = results[0]
                record["embedding"] = new_embedding
                
                # 准备数据
                data = [[record[f.name] for f in collection.schema.fields if not f.auto_id]]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 ID {id} 的向量")
                return True
            
            new_vector = [np.random.random() for _ in range(128)]
            update_embedding(collection, 1, new_vector)
            
            # 条件批量更新
            def conditional_update(collection, expr, field_name, new_value):
                """根据条件批量更新字段"""
                # 查询符合条件的记录
                results = collection.query(
                    expr=expr,
                    output_fields=["*"],
                    limit=16384
                )
                
                if not results:
                    print("没有匹配的记录")
                    return 0
                
                # 更新字段
                for record in results:
                    record[field_name] = new_value
                
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in results]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 {len(results)} 条记录")
                return len(results)
            
            # 将所有test类别改为tech类别
            conditional_update(collection, 'category == "test"', "category", "tech")
            ---

02.增量更新
    a.向量重新编码
        a.功能说明
            当文档内容变化时，需要重新生成向量并更新。这是向量数据库中最常见的更新场景。需要保持向量与文档内容的一致性。可以使用相同的编码模型确保向量空间一致。增量更新适合实时更新的应用，如新闻、社交媒体等。建议批量处理更新请求，提高效率。更新后可能需要重建索引以保持查询性能。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 模拟向量编码器
            def encode_text(text):
                """将文本编码为向量（实际应使用真实的编码模型）"""
                # 这里用随机向量模拟
                return [np.random.random() for _ in range(128)]
            
            # 更新文档内容和向量
            def update_document(collection, doc_id, new_title, new_content):
                """更新文档内容并重新编码向量"""
                # 查询现有数据
                results = collection.query(
                    expr=f"id == {doc_id}",
                    output_fields=["*"]
                )
                
                if not results:
                    print(f"文档 {doc_id} 不存在")
                    return False
                
                # 重新编码向量
                new_embedding = encode_text(new_title + " " + new_content)
                
                # 更新记录
                record = results[0]
                record["title"] = new_title
                record["embedding"] = new_embedding
                record["timestamp"] = int(time.time())  # 更新时间戳
                
                # Upsert
                data = [[record[f.name] for f in collection.schema.fields if not f.auto_id]]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新文档 {doc_id}")
                return True
            
            update_document(collection, 1, "新标题", "新内容...")
            
            # 批量重新编码
            def batch_reencode(collection, doc_ids):
                """批量重新编码向量"""
                # 查询文档
                results = collection.query(
                    expr=f"id in {doc_ids}",
                    output_fields=["*"]
                )
                
                if not results:
                    print("没有找到文档")
                    return 0
                
                # 重新编码
                updated_records = []
                for record in results:
                    # 重新编码
                    new_embedding = encode_text(record["title"])
                    record["embedding"] = new_embedding
                    record["timestamp"] = int(time.time())
                    updated_records.append(record)
                
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in updated_records]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已重新编码 {len(updated_records)} 个文档")
                return len(updated_records)
            
            batch_reencode(collection, [1, 2, 3, 4, 5])
            
            # 增量更新队列
            import queue
            import threading
            import time
            
            class IncrementalUpdater:
                def __init__(self, collection, batch_size=100, flush_interval=5):
                    self.collection = collection
                    self.batch_size = batch_size
                    self.flush_interval = flush_interval
                    self.update_queue = queue.Queue()
                    self.running = False
                
                def start(self):
                    """启动更新线程"""
                    self.running = True
                    self.worker_thread = threading.Thread(target=self._worker, daemon=True)
                    self.worker_thread.start()
                
                def stop(self):
                    """停止更新线程"""
                    self.running = False
                    self.worker_thread.join()
                
                def submit_update(self, doc_id, title, content):
                    """提交更新请求"""
                    self.update_queue.put((doc_id, title, content))
                
                def _worker(self):
                    """后台更新线程"""
                    batch = []
                    last_flush = time.time()
                    
                    while self.running:
                        try:
                            # 获取更新请求（超时）
                            item = self.update_queue.get(timeout=1)
                            batch.append(item)
                            
                            # 达到批次大小或超时，执行更新
                            if len(batch) >= self.batch_size or \
                               (time.time() - last_flush) > self.flush_interval:
                                self._flush_batch(batch)
                                batch = []
                                last_flush = time.time()
                                
                        except queue.Empty:
                            # 超时，检查是否有待处理的批次
                            if batch and (time.time() - last_flush) > self.flush_interval:
                                self._flush_batch(batch)
                                batch = []
                                last_flush = time.time()
                
                def _flush_batch(self, batch):
                    """刷新批次更新"""
                    if not batch:
                        return
                    
                    doc_ids = [item[0] for item in batch]
                    
                    # 查询现有数据
                    results = self.collection.query(
                        expr=f"id in {doc_ids}",
                        output_fields=["*"]
                    )
                    
                    records_map = {r["id"]: r for r in results}
                    
                    # 更新记录
                    updated_records = []
                    for doc_id, title, content in batch:
                        if doc_id in records_map:
                            record = records_map[doc_id]
                            record["title"] = title
                            record["embedding"] = encode_text(title + " " + content)
                            record["timestamp"] = int(time.time())
                            updated_records.append(record)
                    
                    if updated_records:
                        # 转换为列式格式
                        field_data = {}
                        for field in self.collection.schema.fields:
                            if not field.auto_id:
                                field_data[field.name] = [r[field.name] for r in updated_records]
                        
                        data = [field_data[f.name] for f in self.collection.schema.fields if not f.auto_id]
                        self.collection.upsert(data)
                        self.collection.flush()
                        
                        print(f"批量更新 {len(updated_records)} 个文档")
            
            # 使用增量更新器
            updater = IncrementalUpdater(collection, batch_size=100, flush_interval=5)
            updater.start()
            
            # 提交更新请求
            for i in range(50):
                updater.submit_update(i, f"更新标题{i}", f"更新内容{i}")
            
            # 等待处理完成
            time.sleep(10)
            updater.stop()
            ---
    b.元数据更新
        a.功能说明
            元数据更新不涉及向量变化，只更新标量字段。这种更新比向量更新简单，但仍需要查询完整数据。适合更新分类、标签、状态等字段。可以通过缓存减少查询开销。元数据更新频率通常高于向量更新。建议使用批量更新提高效率。对于高频更新的字段，可以考虑使用外部存储。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 更新分类
            def update_category(collection, doc_ids, new_category):
                """批量更新分类"""
                results = collection.query(
                    expr=f"id in {doc_ids}",
                    output_fields=["*"]
                )
                
                if not results:
                    return 0
                
                # 更新分类
                for record in results:
                    record["category"] = new_category
                    record["timestamp"] = int(time.time())
                
                # Upsert
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in results]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 {len(results)} 个文档的分类")
                return len(results)
            
            update_category(collection, [1, 2, 3], "AI")
            
            # 批量添加标签
            def add_tags(collection, doc_ids, new_tags):
                """批量添加标签（假设使用JSON字段存储标签）"""
                results = collection.query(
                    expr=f"id in {doc_ids}",
                    output_fields=["*"]
                )
                
                for record in results:
                    # 获取现有标签
                    metadata = record.get("metadata", {})
                    existing_tags = metadata.get("tags", [])
                    
                    # 添加新标签
                    updated_tags = list(set(existing_tags + new_tags))
                    metadata["tags"] = updated_tags
                    
                    record["metadata"] = metadata
                    record["timestamp"] = int(time.time())
                
                # Upsert
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in results]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已为 {len(results)} 个文档添加标签")
            
            add_tags(collection, [1, 2, 3], ["机器学习", "深度学习"])
            
            # 元数据缓存
            class MetadataCache:
                def __init__(self, collection, cache_size=1000):
                    self.collection = collection
                    self.cache = {}
                    self.cache_size = cache_size
                    self.access_order = []
                
                def get(self, doc_id):
                    """获取文档元数据"""
                    if doc_id in self.cache:
                        # 更新访问顺序
                        self.access_order.remove(doc_id)
                        self.access_order.append(doc_id)
                        return self.cache[doc_id]
                    
                    # 从数据库查询
                    results = self.collection.query(
                        expr=f"id == {doc_id}",
                        output_fields=["*"]
                    )
                    
                    if not results:
                        return None
                    
                    record = results[0]
                    
                    # 添加到缓存
                    if len(self.cache) >= self.cache_size:
                        # 移除最久未使用的
                        old_id = self.access_order.pop(0)
                        del self.cache[old_id]
                    
                    self.cache[doc_id] = record
                    self.access_order.append(doc_id)
                    
                    return record
                
                def update(self, doc_id, updates):
                    """更新文档元数据"""
                    record = self.get(doc_id)
                    if not record:
                        return False
                    
                    # 更新字段
                    for key, value in updates.items():
                        record[key] = value
                    
                    record["timestamp"] = int(time.time())
                    
                    # 更新缓存
                    self.cache[doc_id] = record
                    
                    # Upsert到数据库
                    data = [[record[f.name] for f in self.collection.schema.fields if not f.auto_id]]
                    self.collection.upsert(data)
                    
                    return True
                
                def flush(self):
                    """刷新所有缓存的更新"""
                    self.collection.flush()
            
            # 使用元数据缓存
            cache = MetadataCache(collection, cache_size=1000)
            
            # 更新元数据
            cache.update(1, {"category": "AI", "views": 1000})
            cache.update(2, {"category": "ML", "views": 500})
            
            # 刷新
            cache.flush()
            ---

4.4 批量操作

01.批量插入优化
    a.数据预处理
        a.功能说明
            批量插入前的数据预处理可以显著提高性能。包括数据验证、格式转换、去重等操作。预处理可以在插入前发现错误，避免部分插入失败。建议使用NumPy等高效库处理大规模数据。可以并行处理数据预处理和插入操作。预处理应该包括维度检查、类型转换、空值处理等。合理的预处理可以减少插入时的错误和重试。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import pandas as pd
            
            collection = Collection("documents")
            
            # 数据验证器
            class DataValidator:
                def __init__(self, schema):
                    self.schema = schema
                    self.field_map = {f.name: f for f in schema.fields}
                
                def validate_record(self, record):
                    """验证单条记录"""
                    errors = []
                    
                    # 检查必需字段
                    for field in self.schema.fields:
                        if field.auto_id:
                            continue
                        
                        if field.name not in record:
                            errors.append(f"缺少字段: {field.name}")
                            continue
                        
                        value = record[field.name]
                        
                        # 检查向量维度
                        if str(field.dtype) == "DataType.FLOAT_VECTOR":
                            expected_dim = field.params.get("dim")
                            if len(value) != expected_dim:
                                errors.append(f"向量维度错误: {field.name}, 期望{expected_dim}, 实际{len(value)}")
                        
                        # 检查VARCHAR长度
                        elif str(field.dtype) == "DataType.VARCHAR":
                            max_len = field.params.get("max_length")
                            if len(str(value)) > max_len:
                                errors.append(f"字符串过长: {field.name}, 最大{max_len}, 实际{len(str(value))}")
                    
                    return len(errors) == 0, errors
                
                def validate_batch(self, records):
                    """验证批次数据"""
                    valid_records = []
                    invalid_records = []
                    
                    for i, record in enumerate(records):
                        is_valid, errors = self.validate_record(record)
                        if is_valid:
                            valid_records.append(record)
                        else:
                            invalid_records.append((i, record, errors))
                    
                    return valid_records, invalid_records
            
            # 使用验证器
            validator = DataValidator(collection.schema)
            
            test_records = [
                {"id": 1, "title": "文档1", "category": "AI", "timestamp": 1700000000, "embedding": [0.1]*128},
                {"id": 2, "title": "文档2", "category": "ML", "timestamp": 1700000000, "embedding": [0.2]*100},  # 维度错误
                {"id": 3, "title": "x"*300, "category": "DL", "timestamp": 1700000000, "embedding": [0.3]*128}  # 标题过长
            ]
            
            valid, invalid = validator.validate_batch(test_records)
            print(f"有效记录: {len(valid)}")
            print(f"无效记录: {len(invalid)}")
            for i, record, errors in invalid:
                print(f"  记录{i}: {errors}")
            
            # 数据预处理管道
            class DataPreprocessor:
                def __init__(self, schema):
                    self.schema = schema
                
                def preprocess_batch(self, records):
                    """预处理批次数据"""
                    processed = []
                    
                    for record in records:
                        processed_record = self.preprocess_record(record)
                        if processed_record:
                            processed.append(processed_record)
                    
                    return processed
                
                def preprocess_record(self, record):
                    """预处理单条记录"""
                    processed = {}
                    
                    for field in self.schema.fields:
                        if field.auto_id:
                            continue
                        
                        if field.name not in record:
                            return None
                        
                        value = record[field.name]
                        
                        # VARCHAR截断
                        if str(field.dtype) == "DataType.VARCHAR":
                            max_len = field.params.get("max_length")
                            value = str(value)[:max_len]
                        
                        # 向量归一化
                        elif str(field.dtype) == "DataType.FLOAT_VECTOR":
                            value = np.array(value, dtype=np.float32)
                            # L2归一化
                            norm = np.linalg.norm(value)
                            if norm > 0:
                                value = (value / norm).tolist()
                            else:
                                value = value.tolist()
                        
                        # 整数类型转换
                        elif "INT" in str(field.dtype):
                            value = int(value)
                        
                        # 浮点类型转换
                        elif "FLOAT" in str(field.dtype) or "DOUBLE" in str(field.dtype):
                            value = float(value)
                        
                        processed[field.name] = value
                    
                    return processed
            
            # 使用预处理器
            preprocessor = DataPreprocessor(collection.schema)
            
            raw_data = [
                {"id": "100", "title": "x"*300, "category": "AI", "timestamp": "1700000000", "embedding": [1.0]*128},
                {"id": "101", "title": "文档2", "category": "ML", "timestamp": "1700000001", "embedding": [2.0]*128}
            ]
            
            processed_data = preprocessor.preprocess_batch(raw_data)
            print(f"预处理完成: {len(processed_data)} 条记录")
            
            # 批量插入预处理后的数据
            if processed_data:
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in processed_data]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.insert(data)
                collection.flush()
            ---
    b.内存管理
        a.功能说明
            大规模批量插入需要注意内存管理，避免内存溢出。建议使用生成器或迭代器处理大文件，而不是一次性加载到内存。可以使用分块读取的方式处理CSV、JSON等文件。NumPy数组比Python list更节省内存。及时释放不再使用的数据结构。可以通过监控内存使用情况动态调整批次大小。使用内存映射文件处理超大数据集。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import pandas as pd
            import psutil
            import gc
            
            collection = Collection("documents")
            
            def get_memory_usage():
                """获取当前内存使用（MB）"""
                process = psutil.Process()
                return process.memory_info().rss / 1024 / 1024
            
            # 生成器方式读取大文件
            def read_large_csv(filename, chunk_size=10000):
                """分块读取大CSV文件"""
                for chunk in pd.read_csv(filename, chunksize=chunk_size):
                    yield chunk
            
            # 批量插入大文件
            def insert_from_large_file(collection, filename, batch_size=1000):
                """从大文件批量插入"""
                total_inserted = 0
                
                for chunk in read_large_csv(filename, chunk_size=batch_size):
                    # 转换为插入格式
                    ids = chunk["id"].tolist()
                    titles = chunk["title"].tolist()
                    categories = chunk["category"].tolist()
                    timestamps = chunk["timestamp"].tolist()
                    
                    # 假设embedding列是字符串格式的列表
                    embeddings = chunk["embedding"].apply(eval).tolist()
                    
                    data = [ids, titles, categories, timestamps, embeddings]
                    collection.insert(data)
                    
                    total_inserted += len(ids)
                    
                    # 显示进度和内存使用
                    memory_mb = get_memory_usage()
                    print(f"已插入: {total_inserted}, 内存: {memory_mb:.2f}MB")
                    
                    # 定期刷新
                    if total_inserted % 10000 == 0:
                        collection.flush()
                        gc.collect()  # 强制垃圾回收
                
                collection.flush()
                print(f"插入完成: {total_inserted} 条记录")
            
            # 使用NumPy节省内存
            def efficient_batch_insert(collection, count=100000):
                """高效批量插入"""
                batch_size = 1000
                
                for i in range(0, count, batch_size):
                    batch_count = min(batch_size, count - i)
                    
                    # 使用NumPy生成数据（更节省内存）
                    ids = np.arange(i, i + batch_count, dtype=np.int64)
                    embeddings = np.random.rand(batch_count, 128).astype(np.float32)
                    
                    # 转换为list（Milvus要求）
                    data = [
                        ids.tolist(),
                        [f"文档{j}" for j in range(i, i + batch_count)],
                        ["技术"] * batch_count,
                        [1700000000] * batch_count,
                        embeddings.tolist()
                    ]
                    
                    collection.insert(data)
                    
                    # 清理NumPy数组
                    del ids, embeddings
                    
                    if (i + batch_count) % 10000 == 0:
                        collection.flush()
                        gc.collect()
                        memory_mb = get_memory_usage()
                        print(f"进度: {i + batch_count}/{count}, 内存: {memory_mb:.2f}MB")
                
                collection.flush()
            
            efficient_batch_insert(collection, count=100000)
            
            # 自适应批次大小
            class AdaptiveBatchInserter:
                def __init__(self, collection, max_memory_mb=1024):
                    self.collection = collection
                    self.max_memory_mb = max_memory_mb
                    self.batch_size = 1000
                
                def insert_batch(self, data):
                    """插入批次并调整批次大小"""
                    memory_before = get_memory_usage()
                    
                    self.collection.insert(data)
                    
                    memory_after = get_memory_usage()
                    memory_used = memory_after - memory_before
                    
                    # 根据内存使用调整批次大小
                    if memory_after > self.max_memory_mb * 0.8:
                        # 内存使用过高，减小批次
                        self.batch_size = max(100, int(self.batch_size * 0.8))
                        print(f"减小批次大小: {self.batch_size}")
                    elif memory_used < 50 and self.batch_size < 10000:
                        # 内存使用较低，增大批次
                        self.batch_size = min(10000, int(self.batch_size * 1.2))
                        print(f"增大批次大小: {self.batch_size}")
                    
                    return self.batch_size
            
            inserter = AdaptiveBatchInserter(collection, max_memory_mb=1024)
            
            # 使用自适应插入
            total = 50000
            current = 0
            
            while current < total:
                batch_count = min(inserter.batch_size, total - current)
                
                # 生成批次数据
                data = [
                    list(range(current, current + batch_count)),
                    [f"文档{i}" for i in range(batch_count)],
                    ["技术"] * batch_count,
                    [1700000000] * batch_count,
                    [[0.1]*128 for _ in range(batch_count)]
                ]
                
                inserter.insert_batch(data)
                current += batch_count
            
            collection.flush()
            ---

02.批量查询优化
    a.并行查询
        a.功能说明
            批量查询可以通过并行处理提高吞吐量。Milvus支持多个查询并发执行。可以使用线程池或进程池并行发送查询请求。需要注意控制并发度，避免过载服务器。并行查询适合查询延迟敏感的场景。可以通过批量查询减少网络往返次数。建议根据服务器性能调整并发数量。
        b.代码示例
            ---
            from pymilvus import Collection
            import concurrent.futures
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 单个查询函数
            def query_by_id(collection, doc_id):
                """按ID查询"""
                results = collection.query(
                    expr=f"id == {doc_id}",
                    output_fields=["id", "title", "category"]
                )
                return results
            
            # 串行查询
            def serial_query(collection, doc_ids):
                """串行查询"""
                start = time.time()
                results = []
                
                for doc_id in doc_ids:
                    result = query_by_id(collection, doc_id)
                    results.extend(result)
                
                elapsed = time.time() - start
                return results, elapsed
            
            # 并行查询
            def parallel_query(collection, doc_ids, max_workers=10):
                """并行查询"""
                start = time.time()
                results = []
                
                with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
                    futures = [executor.submit(query_by_id, collection, doc_id) for doc_id in doc_ids]
                    
                    for future in concurrent.futures.as_completed(futures):
                        result = future.result()
                        results.extend(result)
                
                elapsed = time.time() - start
                return results, elapsed
            
            # 性能对比
            test_ids = list(range(1, 101))
            
            results_serial, time_serial = serial_query(collection, test_ids)
            print(f"串行查询: {len(results_serial)} 条, 耗时: {time_serial:.2f}s")
            
            results_parallel, time_parallel = parallel_query(collection, test_ids, max_workers=10)
            print(f"并行查询: {len(results_parallel)} 条, 耗时: {time_parallel:.2f}s")
            print(f"加速比: {time_serial / time_parallel:.2f}x")
            
            # 批量IN查询
            def batch_in_query(collection, doc_ids, batch_size=100):
                """批量IN查询"""
                results = []
                
                for i in range(0, len(doc_ids), batch_size):
                    batch = doc_ids[i:i+batch_size]
                    
                    batch_results = collection.query(
                        expr=f"id in {batch}",
                        output_fields=["id", "title", "category"]
                    )
                    results.extend(batch_results)
                
                return results
            
            # 批量查询（更高效）
            results_batch = batch_in_query(collection, test_ids, batch_size=50)
            print(f"批量查询: {len(results_batch)} 条")
            
            # 混合策略：批量+并行
            def hybrid_query(collection, doc_ids, batch_size=50, max_workers=5):
                """混合查询策略"""
                # 分批
                batches = [doc_ids[i:i+batch_size] for i in range(0, len(doc_ids), batch_size)]
                
                results = []
                
                # 并行执行批次查询
                with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
                    futures = [
                        executor.submit(
                            collection.query,
                            expr=f"id in {batch}",
                            output_fields=["id", "title", "category"]
                        )
                        for batch in batches
                    ]
                    
                    for future in concurrent.futures.as_completed(futures):
                        batch_results = future.result()
                        results.extend(batch_results)
                
                return results
            
            start = time.time()
            results_hybrid = hybrid_query(collection, test_ids, batch_size=20, max_workers=5)
            time_hybrid = time.time() - start
            print(f"混合查询: {len(results_hybrid)} 条, 耗时: {time_hybrid:.2f}s")
            ---
    b.结果聚合
        a.功能说明
            批量查询后需要聚合结果，包括去重、排序、分页等操作。可以在应用层实现复杂的聚合逻辑。需要注意内存占用，大量结果应该分批处理。可以使用生成器返回结果，减少内存压力。聚合操作应该考虑性能，避免O(n²)复杂度的算法。可以使用Pandas等库简化聚合操作。
        b.代码示例
            ---
            from pymilvus import Collection
            import pandas as pd
            from collections import defaultdict
            
            collection = Collection("documents")
            collection.load()
            
            # 批量查询并聚合
            def query_and_aggregate(collection, categories):
                """按类别查询并聚合统计"""
                results_by_category = defaultdict(list)
                
                for category in categories:
                    results = collection.query(
                        expr=f'category == "{category}"',
                        output_fields=["id", "title", "category", "timestamp"],
                        limit=1000
                    )
                    results_by_category[category].extend(results)
                
                # 统计每个类别的数量
                stats = {cat: len(results) for cat, results in results_by_category.items()}
                
                return results_by_category, stats
            
            categories = ["AI", "ML", "DL"]
            results, stats = query_and_aggregate(collection, categories)
            
            print("类别统计:")
            for cat, count in stats.items():
                print(f"  {cat}: {count} 条")
            
            # 使用Pandas聚合
            def query_to_dataframe(collection, expr, limit=10000):
                """查询结果转DataFrame"""
                results = collection.query(
                    expr=expr,
                    output_fields=["*"],
                    limit=limit
                )
                
                if not results:
                    return pd.DataFrame()
                
                df = pd.DataFrame(results)
                return df
            
            # 查询并分析
            df = query_to_dataframe(collection, "id > 0", limit=10000)
            
            if not df.empty:
                # 按类别统计
                category_counts = df["category"].value_counts()
                print("\n类别分布:")
                print(category_counts)
                
                # 时间范围
                if "timestamp" in df.columns:
                    df["datetime"] = pd.to_datetime(df["timestamp"], unit="s")
                    print(f"\n时间范围: {df['datetime'].min()} 到 {df['datetime'].max()}")
                
                # 导出结果
                df.to_csv("query_results.csv", index=False)
                print("\n结果已导出到 query_results.csv")
            
            # 分页聚合
            def paginated_query(collection, expr, page_size=100):
                """分页查询（生成器）"""
                offset = 0
                
                while True:
                    results = collection.query(
                        expr=expr,
                        output_fields=["*"],
                        limit=page_size,
                        offset=offset
                    )
                    
                    if not results:
                        break
                    
                    yield results
                    offset += page_size
            
            # 使用分页查询
            total_count = 0
            for page in paginated_query(collection, "id > 0", page_size=1000):
                total_count += len(page)
                print(f"处理了 {len(page)} 条记录，累计: {total_count}")
            
            # 多条件聚合
            def multi_condition_aggregate(collection):
                """多条件聚合查询"""
                conditions = [
                    ('category == "AI"', "AI类别"),
                    ('category == "ML" and timestamp > 1700000000', "ML类别且时间>阈值"),
                    ('category == "DL" or category == "NLP"', "DL或NLP类别")
                ]
                
                results = {}
                
                for expr, desc in conditions:
                    query_results = collection.query(
                        expr=expr,
                        output_fields=["id", "title", "category"],
                        limit=1000
                    )
                    results[desc] = query_results
                    print(f"{desc}: {len(query_results)} 条")
                
                return results
            
            aggregated = multi_condition_aggregate(collection)
            ---

5 索引系统

5.1 向量索引类型

01.索引分类
    a.精确索引
        a.功能说明
            精确索引（FLAT）通过暴力计算保证100%召回率。适合小规模数据集（百万级以下）或对召回率要求极高的场景。不需要训练过程，构建速度快。查询时需要计算与所有向量的距离，性能随数据量线性下降。内存占用与数据量成正比。精确索引是其他索引的性能基准，常用于对比测试。适合原型开发和小规模应用。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("flat_index_demo", schema=schema)
            
            # 插入测试数据
            ids = list(range(10000))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(10000)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 创建FLAT索引
            index_params = {
                "index_type": "FLAT",
                "metric_type": "L2",
                "params": {}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            
            print("FLAT索引创建完成")
            
            # 加载并搜索
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=10
            )
            
            print(f"搜索结果: {len(results[0])} 条")
            for hit in results[0]:
                print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            ---
    b.近似索引
        a.功能说明
            近似索引通过牺牲少量召回率换取查询性能提升。包括IVF、HNSW、ANNOY等多种算法。需要训练过程，构建时间较长。查询性能不随数据量线性增长，适合大规模数据。内存占用可以通过参数调整。召回率通常在95%-99%之间，满足大多数应用需求。不同算法有不同的性能特点，需要根据场景选择。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("approx_index_demo", schema=schema)
            
            # 插入大规模数据
            batch_size = 10000
            total_count = 100000
            
            for i in range(0, total_count, batch_size):
                ids = list(range(i, i + batch_size))
                embeddings = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                data = [ids, embeddings]
                collection.insert(data)
                print(f"已插入: {i + batch_size}/{total_count}")
            
            collection.flush()
            
            # 创建IVF_FLAT索引（近似索引）
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}  # 聚类中心数量
            }
            
            print("开始构建索引...")
            start = time.time()
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            elapsed = time.time() - start
            print(f"索引构建完成，耗时: {elapsed:.2f}s")
            
            # 加载并搜索
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 搜索参数（控制召回率和性能）
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}  # 搜索的聚类数量
            }
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            elapsed = time.time() - start
            
            print(f"搜索完成，耗时: {elapsed*1000:.2f}ms")
            print(f"结果数量: {len(results[0])}")
            ---

02.索引算法
    a.IVF系列
        a.功能说明
            IVF（Inverted File Index）是基于聚类的索引算法。将向量空间划分为多个聚类（Voronoi单元），查询时只搜索最近的几个聚类。IVF_FLAT保留原始向量，IVF_SQ8使用标量量化压缩，IVF_PQ使用乘积量化压缩。nlist参数控制聚类数量，通常设置为sqrt(N)到4*sqrt(N)。nprobe参数控制搜索的聚类数量，越大召回率越高但性能越低。适合中大规模数据集（百万到亿级）。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # IVF_FLAT: 精确距离计算
            ivf_flat_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024  # 聚类中心数量
                }
            }
            
            # IVF_SQ8: 标量量化（节省75%内存）
            ivf_sq8_params = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024
                }
            }
            
            # IVF_PQ: 乘积量化（节省90%+内存）
            ivf_pq_params = {
                "index_type": "IVF_PQ",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024,
                    "m": 8,  # 子向量数量（必须能整除dim）
                    "nbits": 8  # 每个子向量的编码位数
                }
            }
            
            # 创建索引
            collection.create_index(
                field_name="embedding",
                index_params=ivf_flat_params
            )
            
            collection.load()
            
            # 搜索参数
            search_params_low = {"metric_type": "L2", "params": {"nprobe": 8}}  # 低召回率，高性能
            search_params_mid = {"metric_type": "L2", "params": {"nprobe": 16}}  # 平衡
            search_params_high = {"metric_type": "L2", "params": {"nprobe": 32}}  # 高召回率，低性能
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 对比不同nprobe的性能
            import time
            
            for params in [search_params_low, search_params_mid, search_params_high]:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=params,
                    limit=10
                )
                elapsed = time.time() - start
                
                nprobe = params["params"]["nprobe"]
                print(f"nprobe={nprobe}: 耗时 {elapsed*1000:.2f}ms")
            ---
    b.图索引
        a.功能说明
            图索引（HNSW）构建多层导航图，通过图遍历快速找到近邻。HNSW（Hierarchical Navigable Small World）是目前性能最好的近似索引之一。查询性能稳定，不受数据分布影响。内存占用较高，但查询速度快。M参数控制图的连接度，efConstruction控制构建质量，ef控制搜索质量。适合对查询延迟要求高的场景。构建时间较长，但查询性能优秀。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # HNSW索引参数
            hnsw_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,  # 每层的最大连接数（4-64）
                    "efConstruction": 200  # 构建时的搜索深度（100-500）
                }
            }
            
            print("开始构建HNSW索引...")
            start = time.time()
            collection.create_index(
                field_name="embedding",
                index_params=hnsw_params
            )
            elapsed = time.time() - start
            print(f"索引构建完成，耗时: {elapsed:.2f}s")
            
            collection.load()
            
            # 搜索参数
            search_params_fast = {"metric_type": "L2", "params": {"ef": 64}}  # 快速搜索
            search_params_balanced = {"metric_type": "L2", "params": {"ef": 128}}  # 平衡
            search_params_accurate = {"metric_type": "L2", "params": {"ef": 256}}  # 高精度
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 对比不同ef的性能
            for params in [search_params_fast, search_params_balanced, search_params_accurate]:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=params,
                    limit=10
                )
                elapsed = time.time() - start
                
                ef = params["params"]["ef"]
                print(f"ef={ef}: 耗时 {elapsed*1000:.2f}ms")
            
            # HNSW vs IVF性能对比
            # 重建为IVF索引
            collection.release()
            collection.drop_index()
            
            ivf_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            collection.create_index(field_name="embedding", index_params=ivf_params)
            collection.load()
            
            # IVF搜索
            start = time.time()
            results_ivf = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10
            )
            time_ivf = time.time() - start
            
            print(f"\nIVF_FLAT: {time_ivf*1000:.2f}ms")
            print(f"HNSW通常比IVF快2-5倍，但内存占用更高")
            ---

03.距离度量
    a.欧氏距离
        a.功能说明
            欧氏距离（L2）是最常用的向量距离度量。计算两个向量之间的直线距离。适合大多数向量相似度场景。距离越小表示越相似。支持归一化和非归一化向量。计算复杂度为O(d)，d为向量维度。Milvus对L2距离有硬件加速优化。适合图像、音频等连续特征的相似度计算。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建L2索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",  # 欧氏距离
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            
            collection.load()
            
            # L2搜索
            query_vector = [[np.random.random() for _ in range(128)]]
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=10
            )
            
            print("L2距离搜索结果:")
            for hit in results[0]:
                print(f"  ID: {hit.id}, L2距离: {hit.distance:.4f}")
            
            # 手动计算L2距离验证
            def l2_distance(vec1, vec2):
                """计算L2距离"""
                vec1 = np.array(vec1)
                vec2 = np.array(vec2)
                return np.sqrt(np.sum((vec1 - vec2) ** 2))
            
            # 验证第一个结果
            first_id = results[0][0].id
            result_vec = collection.query(
                expr=f"id == {first_id}",
                output_fields=["embedding"]
            )[0]["embedding"]
            
            manual_distance = l2_distance(query_vector[0], result_vec)
            milvus_distance = results[0][0].distance
            
            print(f"\n验证:")
            print(f"  Milvus距离: {milvus_distance:.4f}")
            print(f"  手动计算: {manual_distance:.4f}")
            print(f"  误差: {abs(milvus_distance - manual_distance):.6f}")
            ---
    b.内积和余弦
        a.功能说明
            内积（IP）计算两个向量的点积，值越大表示越相似。余弦相似度（COSINE）计算向量夹角的余弦值，范围[-1, 1]。对于归一化向量，IP和COSINE等价。适合文本向量、推荐系统等场景。Milvus中COSINE会自动归一化向量。IP适合已归一化的向量，避免重复归一化开销。内积计算比L2稍快。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建IP索引
            index_params_ip = {
                "index_type": "IVF_FLAT",
                "metric_type": "IP",  # 内积
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params_ip
            )
            
            collection.load()
            
            # 归一化查询向量
            query_vector = np.random.random(128)
            query_vector = query_vector / np.linalg.norm(query_vector)  # L2归一化
            query_vector = [query_vector.tolist()]
            
            # IP搜索
            results_ip = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "IP"},
                limit=10
            )
            
            print("内积搜索结果:")
            for hit in results_ip[0]:
                print(f"  ID: {hit.id}, 内积: {hit.distance:.4f}")
            
            # 使用COSINE
            collection.release()
            collection.drop_index()
            
            index_params_cosine = {
                "index_type": "IVF_FLAT",
                "metric_type": "COSINE",  # 余弦相似度
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params_cosine
            )
            
            collection.load()
            
            # COSINE搜索（自动归一化）
            query_vector_raw = [[np.random.random() for _ in range(128)]]  # 未归一化
            
            results_cosine = collection.search(
                data=query_vector_raw,
                anns_field="embedding",
                param={"metric_type": "COSINE"},
                limit=10
            )
            
            print("\n余弦相似度搜索结果:")
            for hit in results_cosine[0]:
                print(f"  ID: {hit.id}, 余弦相似度: {hit.distance:.4f}")
            
            # 手动计算余弦相似度
            def cosine_similarity(vec1, vec2):
                """计算余弦相似度"""
                vec1 = np.array(vec1)
                vec2 = np.array(vec2)
                return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
            
            # 验证
            first_id = results_cosine[0][0].id
            result_vec = collection.query(
                expr=f"id == {first_id}",
                output_fields=["embedding"]
            )[0]["embedding"]
            
            manual_cosine = cosine_similarity(query_vector_raw[0], result_vec)
            milvus_cosine = results_cosine[0][0].distance
            
            print(f"\n验证:")
            print(f"  Milvus余弦: {milvus_cosine:.4f}")
            print(f"  手动计算: {manual_cosine:.4f}")
            
            # IP vs COSINE对比
            print("\nIP vs COSINE:")
            print("  归一化向量: IP == COSINE")
            print("  未归一化向量: COSINE会自动归一化，IP不会")
            print("  性能: IP略快（避免归一化开销）")
            print("  适用场景: 文本向量通常使用COSINE，图像向量可以使用L2或IP")
            ---

5.2 FLAT索引

01.基本特性
    a.精确搜索
        a.功能说明
            FLAT索引通过暴力计算保证100%召回率，是唯一的精确索引类型。搜索时计算查询向量与所有向量的距离，然后返回Top-K结果。不需要训练过程，创建索引几乎是瞬时的。内存占用等于原始向量数据大小。查询时间复杂度为O(N*d)，N为向量数量，d为维度。适合数据量小于100万的场景。常用作其他索引的性能和召回率基准。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields, description="FLAT索引测试")
            collection = Collection("flat_test", schema=schema)
            
            # 插入测试数据
            data_sizes = [1000, 10000, 100000]
            
            for size in data_sizes:
                # 清空collection
                collection.drop()
                collection = Collection("flat_test", schema=schema)
                
                # 插入数据
                ids = list(range(size))
                titles = [f"文档{i}" for i in range(size)]
                embeddings = [[np.random.random() for _ in range(128)] for _ in range(size)]
                
                data = [ids, titles, embeddings]
                collection.insert(data)
                collection.flush()
                
                # 创建FLAT索引
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                index_time = time.time() - start
                
                collection.load()
                
                # 测试查询性能
                query_vector = [[np.random.random() for _ in range(128)]]
                
                # 预热
                collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=10
                )
                
                # 正式测试
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=10
                )
                query_time = time.time() - start
                
                print(f"\n数据量: {size:,}")
                print(f"  索引构建时间: {index_time*1000:.2f}ms")
                print(f"  查询时间: {query_time*1000:.2f}ms")
                print(f"  召回率: 100% (精确搜索)")
            ---
    b.适用场景
        a.功能说明
            FLAT索引适合小规模数据集、原型开发、精确搜索需求、召回率基准测试等场景。在数据量小于10万时性能可接受。适合对召回率有严格要求的应用，如医疗、金融等领域。可以作为其他索引的对照组，验证近似索引的召回率。在开发初期使用FLAT索引可以快速验证功能。不适合大规模生产环境，除非数据量确实很小。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            # 场景1: 小规模精确搜索
            def small_scale_exact_search():
                """小规模数据的精确搜索"""
                collection = Collection("medical_images")  # 假设医疗图像库
                
                # FLAT索引保证精确结果
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 查询最相似的病例
                query_vector = [[0.1] * 128]  # 患者图像向量
                
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=5,
                    output_fields=["id", "title"]
                )
                
                print("最相似的5个病例（100%精确）:")
                for hit in results[0]:
                    print(f"  病例ID: {hit.id}, 相似度: {hit.distance:.4f}")
            
            # 场景2: 召回率基准测试
            def recall_benchmark():
                """使用FLAT作为召回率基准"""
                collection = Collection("documents")
                
                query_vector = [[np.random.random() for _ in range(128)]]
                
                # FLAT索引（精确结果）
                collection.release()
                collection.drop_index()
                
                flat_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                collection.create_index(field_name="embedding", index_params=flat_params)
                collection.load()
                
                flat_results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=100
                )
                
                flat_ids = set([hit.id for hit in flat_results[0]])
                
                # IVF索引（近似结果）
                collection.release()
                collection.drop_index()
                
                ivf_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 1024}
                }
                collection.create_index(field_name="embedding", index_params=ivf_params)
                collection.load()
                
                ivf_results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=100
                )
                
                ivf_ids = set([hit.id for hit in ivf_results[0]])
                
                # 计算召回率
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                print(f"IVF索引召回率: {recall*100:.2f}%")
            
            # 场景3: 原型开发
            def prototype_development():
                """原型开发阶段使用FLAT索引"""
                collection = Collection("prototype_collection")
                
                # 快速创建索引，无需调参
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                print("原型开发建议:")
                print("  1. 使用FLAT索引快速验证功能")
                print("  2. 数据量控制在10万以内")
                print("  3. 功能稳定后再切换到近似索引")
                print("  4. 保留FLAT索引作为召回率基准")
            
            small_scale_exact_search()
            recall_benchmark()
            prototype_development()
            ---

02.性能特征
    a.时间复杂度
        a.功能说明
            FLAT索引的构建时间复杂度为O(1)，几乎瞬时完成。查询时间复杂度为O(N*d)，N为向量数量，d为维度。随着数据量增长，查询时间线性增长。批量查询可以利用SIMD指令加速。GPU加速可以显著提升性能。对于固定数据量，查询时间相对稳定。不受数据分布影响，性能可预测。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import matplotlib.pyplot as plt
            
            # 测试不同数据量的查询时间
            def test_query_time_scaling():
                """测试查询时间随数据量的变化"""
                data_sizes = [1000, 5000, 10000, 50000, 100000]
                query_times = []
                
                for size in data_sizes:
                    # 创建collection
                    fields = [
                        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
                    ]
                    schema = CollectionSchema(fields=fields)
                    collection = Collection(f"flat_scale_test_{size}", schema=schema)
                    
                    # 插入数据
                    ids = list(range(size))
                    embeddings = [[np.random.random() for _ in range(128)] for _ in range(size)]
                    data = [ids, embeddings]
                    collection.insert(data)
                    collection.flush()
                    
                    # 创建索引
                    index_params = {
                        "index_type": "FLAT",
                        "metric_type": "L2",
                        "params": {}
                    }
                    collection.create_index(field_name="embedding", index_params=index_params)
                    collection.load()
                    
                    # 测试查询时间
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    # 多次查询取平均
                    times = []
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2"},
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000  # 转换为ms
                    query_times.append(avg_time)
                    
                    print(f"数据量: {size:6d}, 平均查询时间: {avg_time:.2f}ms")
                    
                    # 清理
                    collection.drop()
                
                # 绘制曲线
                plt.figure(figsize=(10, 6))
                plt.plot(data_sizes, query_times, marker='o')
                plt.xlabel('数据量')
                plt.ylabel('查询时间 (ms)')
                plt.title('FLAT索引查询时间随数据量的变化')
                plt.grid(True)
                plt.savefig('flat_scaling.png')
                print("\n性能曲线已保存到 flat_scaling.png")
            
            test_query_time_scaling()
            
            # 测试不同维度的影响
            def test_dimension_impact():
                """测试向量维度对查询时间的影响"""
                dimensions = [64, 128, 256, 512, 1024]
                query_times = []
                
                data_size = 10000
                
                for dim in dimensions:
                    fields = [
                        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
                    ]
                    schema = CollectionSchema(fields=fields)
                    collection = Collection(f"flat_dim_test_{dim}", schema=schema)
                    
                    # 插入数据
                    ids = list(range(data_size))
                    embeddings = [[np.random.random() for _ in range(dim)] for _ in range(data_size)]
                    data = [ids, embeddings]
                    collection.insert(data)
                    collection.flush()
                    
                    # 创建索引
                    index_params = {
                        "index_type": "FLAT",
                        "metric_type": "L2",
                        "params": {}
                    }
                    collection.create_index(field_name="embedding", index_params=index_params)
                    collection.load()
                    
                    # 测试查询时间
                    query_vector = [[np.random.random() for _ in range(dim)]]
                    
                    times = []
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2"},
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000
                    query_times.append(avg_time)
                    
                    print(f"维度: {dim:4d}, 平均查询时间: {avg_time:.2f}ms")
                    
                    collection.drop()
                
                print(f"\n结论: 查询时间与维度成正比")
            
            test_dimension_impact()
            ---
    b.空间复杂度
        a.功能说明
            FLAT索引的空间复杂度为O(N*d*4)字节，N为向量数量，d为维度。不进行任何压缩，完全存储原始向量。对于128维float32向量，每个向量占用512字节。100万向量约占用512MB内存。内存占用是可预测的，不受索引参数影响。相比压缩索引（如IVF_SQ8、IVF_PQ），内存占用最高。适合内存充足的场景。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            
            # 计算内存占用
            def calculate_memory_usage(num_vectors, dim):
                """计算FLAT索引的内存占用"""
                bytes_per_vector = dim * 4  # float32
                total_bytes = num_vectors * bytes_per_vector
                total_mb = total_bytes / 1024 / 1024
                total_gb = total_mb / 1024
                
                return {
                    "vectors": num_vectors,
                    "dimension": dim,
                    "bytes_per_vector": bytes_per_vector,
                    "total_mb": total_mb,
                    "total_gb": total_gb
                }
            
            # 常见规模的内存占用
            scenarios = [
                (10000, 128, "小规模应用"),
                (100000, 128, "中等规模应用"),
                (1000000, 128, "大规模应用"),
                (1000000, 768, "大模型embedding"),
                (10000000, 128, "超大规模应用")
            ]
            
            print("FLAT索引内存占用估算:\n")
            for num_vectors, dim, desc in scenarios:
                usage = calculate_memory_usage(num_vectors, dim)
                print(f"{desc}:")
                print(f"  向量数量: {usage['vectors']:,}")
                print(f"  向量维度: {usage['dimension']}")
                print(f"  单向量大小: {usage['bytes_per_vector']} 字节")
                print(f"  总内存: {usage['total_mb']:.2f} MB ({usage['total_gb']:.2f} GB)")
                print()
            
            # 实际测量内存占用
            def measure_actual_memory():
                """实际测量FLAT索引的内存占用"""
                collection = Collection("memory_test")
                
                # 插入数据
                size = 100000
                dim = 128
                
                ids = list(range(size))
                embeddings = [[np.random.random() for _ in range(dim)] for _ in range(size)]
                data = [ids, embeddings]
                collection.insert(data)
                collection.flush()
                
                # 创建索引
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                collection.create_index(field_name="embedding", index_params=index_params)
                
                # 获取collection统计信息
                stats = collection.get_stats()
                print("Collection统计信息:")
                print(stats)
                
                # 理论内存占用
                theoretical_mb = calculate_memory_usage(size, dim)["total_mb"]
                print(f"\n理论内存占用: {theoretical_mb:.2f} MB")
                print("实际占用略高于理论值（包含元数据和索引结构）")
            
            measure_actual_memory()
            
            # 内存占用对比
            def compare_index_memory():
                """对比不同索引的内存占用"""
                print("\n不同索引类型的内存占用对比（100万向量，128维）:\n")
                
                comparisons = [
                    ("FLAT", 1.0, "512 MB", "无压缩，精确搜索"),
                    ("IVF_FLAT", 1.0, "512 MB", "无压缩，近似搜索"),
                    ("IVF_SQ8", 0.25, "128 MB", "标量量化，节省75%"),
                    ("IVF_PQ", 0.05, "26 MB", "乘积量化，节省95%"),
                    ("HNSW", 1.5, "768 MB", "图索引，额外图结构")
                ]
                
                for index_type, ratio, memory, description in comparisons:
                    print(f"{index_type:12s}: {memory:8s} (相对FLAT: {ratio*100:5.1f}%) - {description}")
                
                print("\n建议:")
                print("  - 内存充足: 使用FLAT或HNSW")
                print("  - 内存紧张: 使用IVF_SQ8或IVF_PQ")
                print("  - 平衡选择: 使用IVF_FLAT")
            
            compare_index_memory()
            ---

5.3 IVF系列索引

01.IVF原理
    a.聚类分区
        a.功能说明
            IVF（Inverted File Index）通过K-means聚类将向量空间划分为多个Voronoi单元。每个单元由一个聚类中心（centroid）表示，向量被分配到最近的聚类中心。查询时先找到最近的几个聚类中心，然后只在这些聚类内搜索。nlist参数控制聚类数量，通常设置为sqrt(N)到4*sqrt(N)，N为向量总数。聚类过程需要训练，使用部分数据进行K-means迭代。训练时间与nlist和数据量成正比。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("ivf_demo", schema=schema)
            
            # 插入数据
            data_size = 100000
            ids = list(range(data_size))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 测试不同nlist值
            nlist_values = [128, 256, 512, 1024, 2048]
            
            for nlist in nlist_values:
                # 创建IVF索引
                index_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": nlist}
                }
                
                print(f"\nnlist = {nlist}")
                
                # 测量构建时间
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                print(f"  构建时间: {build_time:.2f}s")
                
                collection.load()
                
                # 测试查询性能
                query_vector = [[np.random.random() for _ in range(128)]]
                
                # 不同nprobe值
                for nprobe in [1, 8, 16, 32]:
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": nprobe}
                    }
                    
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    query_time = time.time() - start
                    
                    print(f"  nprobe={nprobe:2d}: {query_time*1000:.2f}ms")
                
                # 清理索引
                collection.release()
                collection.drop_index()
            
            # nlist选择建议
            def recommend_nlist(num_vectors):
                """推荐nlist值"""
                sqrt_n = int(np.sqrt(num_vectors))
                
                recommendations = {
                    "conservative": sqrt_n,
                    "balanced": 2 * sqrt_n,
                    "aggressive": 4 * sqrt_n
                }
                
                return recommendations
            
            print(f"\n对于 {data_size:,} 个向量:")
            recs = recommend_nlist(data_size)
            for strategy, value in recs.items():
                print(f"  {strategy}: nlist = {value}")
            ---
    b.搜索策略
        a.功能说明
            IVF搜索分为两个阶段：粗搜索和精搜索。粗搜索阶段计算查询向量到所有聚类中心的距离，选择最近的nprobe个聚类。精搜索阶段在选中的聚类内计算精确距离，返回Top-K结果。nprobe参数控制搜索的聚类数量，是召回率和性能的关键平衡点。nprobe越大召回率越高但性能越低。nprobe=nlist时等价于FLAT索引。建议通过实验确定最优nprobe值。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("ivf_demo")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同nprobe的召回率和性能
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 先用FLAT获取精确结果作为基准
            collection.release()
            collection.drop_index()
            
            flat_params = {
                "index_type": "FLAT",
                "metric_type": "L2",
                "params": {}
            }
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 切换回IVF索引
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同nprobe
            print("nprobe性能和召回率对比:\n")
            print(f"{'nprobe':>8s} {'查询时间':>10s} {'召回率':>8s}")
            print("-" * 30)
            
            nprobe_values = [1, 2, 4, 8, 16, 32, 64, 128]
            
            for nprobe in nprobe_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                # 测量查询时间
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 计算召回率
                ivf_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                
                print(f"{nprobe:8d} {avg_time:9.2f}ms {recall*100:7.2f}%")
            
            # 自动选择nprobe
            def auto_select_nprobe(collection, query_vector, target_recall=0.95, max_nprobe=128):
                """自动选择满足目标召回率的最小nprobe"""
                # 获取精确结果
                collection.release()
                collection.drop_index()
                
                flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
                collection.create_index(field_name="embedding", index_params=flat_params)
                collection.load()
                
                flat_results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=100
                )
                flat_ids = set([hit.id for hit in flat_results[0]])
                
                # 恢复IVF索引
                collection.release()
                collection.drop_index()
                
                ivf_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 1024}
                }
                collection.create_index(field_name="embedding", index_params=ivf_params)
                collection.load()
                
                # 二分查找最优nprobe
                left, right = 1, max_nprobe
                best_nprobe = max_nprobe
                
                while left <= right:
                    mid = (left + right) // 2
                    
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": mid}
                    }
                    
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    
                    ivf_ids = set([hit.id for hit in results[0]])
                    recall = len(flat_ids & ivf_ids) / len(flat_ids)
                    
                    if recall >= target_recall:
                        best_nprobe = mid
                        right = mid - 1
                    else:
                        left = mid + 1
                
                return best_nprobe
            
            optimal_nprobe = auto_select_nprobe(collection, query_vector, target_recall=0.95)
            print(f"\n推荐nprobe值（95%召回率）: {optimal_nprobe}")
            ---

02.IVF变体
    a.IVF_FLAT
        a.功能说明
            IVF_FLAT是最基础的IVF索引，保留原始向量不压缩。查询时计算精确距离，召回率仅受nprobe影响。内存占用与FLAT相同，但查询性能显著提升。适合内存充足且对召回率要求高的场景。是IVF系列中召回率最高的变体。构建速度快于压缩变体。推荐作为IVF系列的首选，除非内存受限。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # IVF_FLAT索引配置
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024  # 聚类数量
                }
            }
            
            print("开始构建IVF_FLAT索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=index_params)
            build_time = time.time() - start
            print(f"构建完成，耗时: {build_time:.2f}s")
            
            collection.load()
            
            # 性能测试
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(100)]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单次查询
            start = time.time()
            results = collection.search(
                data=[query_vectors[0]],
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            single_time = time.time() - start
            print(f"单次查询: {single_time*1000:.2f}ms")
            
            # 批量查询
            start = time.time()
            results = collection.search(
                data=query_vectors,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            batch_time = time.time() - start
            print(f"批量查询(100): {batch_time*1000:.2f}ms")
            print(f"平均每次: {batch_time/100*1000:.2f}ms")
            
            # 内存占用估算
            num_vectors = collection.num_entities
            dim = 128
            memory_mb = num_vectors * dim * 4 / 1024 / 1024
            print(f"\n内存占用估算: {memory_mb:.2f} MB")
            
            # 性能调优建议
            print("\nIVF_FLAT调优建议:")
            print("  1. nlist = sqrt(N) ~ 4*sqrt(N)")
            print("  2. nprobe = 8~64 (根据召回率要求)")
            print("  3. 批量查询可提升吞吐量")
            print("  4. 适合内存充足的场景")
            ---
    b.IVF_SQ8
        a.功能说明
            IVF_SQ8使用8位标量量化压缩向量，将float32压缩到uint8。内存占用降低75%，但会损失精度。量化过程将每个维度的值映射到0-255范围。查询时需要反量化计算距离，略微增加计算开销。适合内存受限但对精度要求不极端的场景。召回率略低于IVF_FLAT，通常在98%以上。推荐用于大规模数据集的内存优化。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # IVF_SQ8索引配置
            index_params = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024
                }
            }
            
            print("开始构建IVF_SQ8索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=index_params)
            build_time = time.time() - start
            print(f"构建完成，耗时: {build_time:.2f}s")
            
            collection.load()
            
            # 性能测试
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            query_time = time.time() - start
            print(f"查询时间: {query_time*1000:.2f}ms")
            
            # 内存节省
            num_vectors = collection.num_entities
            dim = 128
            
            flat_memory = num_vectors * dim * 4 / 1024 / 1024  # float32
            sq8_memory = num_vectors * dim * 1 / 1024 / 1024   # uint8
            savings = (1 - sq8_memory / flat_memory) * 100
            
            print(f"\n内存对比:")
            print(f"  FLAT: {flat_memory:.2f} MB")
            print(f"  SQ8:  {sq8_memory:.2f} MB")
            print(f"  节省: {savings:.1f}%")
            
            # 精度对比
            print("\nIVF_SQ8特点:")
            print("  优点: 节省75%内存，查询速度接近IVF_FLAT")
            print("  缺点: 精度略有损失（通常<2%）")
            print("  适用: 大规模数据集，内存受限场景")
            
            # 量化原理示例
            def quantize_vector(vector):
                """演示标量量化过程"""
                vector = np.array(vector)
                
                # 找到最小值和最大值
                vmin, vmax = vector.min(), vector.max()
                
                # 映射到0-255
                quantized = ((vector - vmin) / (vmax - vmin) * 255).astype(np.uint8)
                
                # 反量化
                dequantized = quantized.astype(np.float32) / 255 * (vmax - vmin) + vmin
                
                # 计算误差
                error = np.abs(vector - dequantized).mean()
                
                return quantized, dequantized, error
            
            test_vector = [np.random.random() for _ in range(128)]
            quantized, dequantized, error = quantize_vector(test_vector)
            
            print(f"\n量化示例:")
            print(f"  原始范围: [{min(test_vector):.4f}, {max(test_vector):.4f}]")
            print(f"  量化范围: [0, 255]")
            print(f"  平均误差: {error:.6f}")
            ---

03.参数调优
    a.nlist选择
        a.功能说明
            nlist是IVF索引最重要的参数，决定聚类数量。nlist过小导致每个聚类包含过多向量，查询性能下降。nlist过大导致聚类过细，粗搜索开销增加。推荐范围：sqrt(N)到4*sqrt(N)，N为向量总数。对于100万向量，推荐nlist=1000-4000。nlist应该是2的幂次，便于内存对齐。需要根据数据分布和查询模式调整。构建时间与nlist成正比。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 测试不同nlist值的性能
            num_vectors = collection.num_entities
            sqrt_n = int(np.sqrt(num_vectors))
            
            nlist_candidates = [
                sqrt_n,
                2 * sqrt_n,
                4 * sqrt_n,
                1024,  # 常用值
                2048,
                4096
            ]
            
            print(f"向量数量: {num_vectors:,}")
            print(f"sqrt(N): {sqrt_n}\n")
            
            results_summary = []
            
            for nlist in nlist_candidates:
                # 创建索引
                index_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": nlist}
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                collection.load()
                
                # 测试查询性能（nprobe=16）
                query_vector = [[np.random.random() for _ in range(128)]]
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": 16}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    times.append(time.time() - start)
                
                avg_query_time = np.mean(times) * 1000
                
                results_summary.append({
                    "nlist": nlist,
                    "build_time": build_time,
                    "query_time": avg_query_time
                })
                
                print(f"nlist={nlist:5d}: 构建 {build_time:5.2f}s, 查询 {avg_query_time:6.2f}ms")
                
                # 清理
                collection.release()
                collection.drop_index()
            
            # 推荐最优nlist
            best = min(results_summary, key=lambda x: x["query_time"])
            print(f"\n推荐nlist: {best['nlist']} (查询时间最短)")
            
            # nlist选择策略
            def recommend_nlist_strategy(num_vectors):
                """推荐nlist选择策略"""
                sqrt_n = int(np.sqrt(num_vectors))
                
                strategies = {
                    "快速构建": sqrt_n,
                    "平衡性能": 2 * sqrt_n,
                    "高性能": 4 * sqrt_n
                }
                
                # 限制在合理范围
                for key in strategies:
                    strategies[key] = max(64, min(65536, strategies[key]))
                    # 向上取整到2的幂次
                    strategies[key] = 2 ** int(np.ceil(np.log2(strategies[key])))
                
                return strategies
            
            strategies = recommend_nlist_strategy(num_vectors)
            print("\nnlist选择策略:")
            for strategy, value in strategies.items():
                print(f"  {strategy}: {value}")
            ---
    b.nprobe调优
        a.功能说明
            nprobe控制搜索时探测的聚类数量，是召回率和性能的平衡点。nprobe=1时性能最快但召回率最低。nprobe=nlist时等价于FLAT索引，召回率100%但性能最差。推荐范围：8-64，根据召回率要求调整。nprobe应该远小于nlist，通常是nlist的1%-10%。可以通过A/B测试确定最优nprobe。不同查询可以使用不同nprobe值。实时查询用小nprobe，离线分析用大nprobe。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 获取精确结果作为基准
            collection.release()
            collection.drop_index()
            
            flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 恢复IVF索引
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同nprobe
            print("nprobe调优分析:\n")
            print(f"{'nprobe':>8s} {'查询时间':>12s} {'召回率':>10s} {'性价比':>10s}")
            print("-" * 45)
            
            nprobe_range = [1, 2, 4, 8, 16, 32, 64, 128, 256]
            
            for nprobe in nprobe_range:
                if nprobe > 1024:  # 不超过nlist
                    continue
                
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                # 测量查询时间
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 计算召回率
                ivf_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                
                # 性价比 = 召回率 / 查询时间
                efficiency = recall / avg_time if avg_time > 0 else 0
                
                print(f"{nprobe:8d} {avg_time:10.2f}ms {recall*100:9.2f}% {efficiency:10.4f}")
            
            # 自动推荐nprobe
            def recommend_nprobe(target_recall=0.95, max_latency_ms=10):
                """根据召回率和延迟要求推荐nprobe"""
                recommendations = []
                
                for nprobe in [1, 2, 4, 8, 16, 32, 64, 128]:
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": nprobe}
                    }
                    
                    # 测试
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    query_time = (time.time() - start) * 1000
                    
                    ivf_ids = set([hit.id for hit in results[0]])
                    recall = len(flat_ids & ivf_ids) / len(flat_ids)
                    
                    if recall >= target_recall and query_time <= max_latency_ms:
                        recommendations.append({
                            "nprobe": nprobe,
                            "recall": recall,
                            "latency": query_time
                        })
                
                return recommendations
            
            print("\n推荐配置（召回率≥95%, 延迟≤10ms）:")
            recs = recommend_nprobe(target_recall=0.95, max_latency_ms=10)
            
            if recs:
                best = min(recs, key=lambda x: x["nprobe"])
                print(f"  推荐nprobe: {best['nprobe']}")
                print(f"  召回率: {best['recall']*100:.2f}%")
                print(f"  延迟: {best['latency']:.2f}ms")
            else:
                print("  无满足条件的配置，建议放宽要求或增加nlist")
            ---

5.4 HNSW索引

01.HNSW原理
    a.分层图结构
        a.功能说明
            HNSW（Hierarchical Navigable Small World）构建多层导航图，每层是一个小世界图。底层包含所有向量节点，上层节点逐层稀疏。查询从最顶层开始，逐层向下搜索，每层找到局部最优后进入下层。图中节点通过边连接，边表示向量间的相似关系。M参数控制每层的最大连接数，影响图的连通性和内存占用。efConstruction控制构建时的搜索宽度，影响图质量。HNSW查询性能稳定，不受数据分布影响。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("hnsw_demo", schema=schema)
            
            # 插入数据
            data_size = 100000
            ids = list(range(data_size))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 测试不同M值
            m_values = [4, 8, 16, 32, 64]
            
            print("HNSW参数M的影响:\n")
            print(f"{'M':>4s} {'构建时间':>12s} {'查询时间':>12s} {'内存估算':>12s}")
            print("-" * 45)
            
            for m in m_values:
                # 创建HNSW索引
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": m,
                        "efConstruction": 200
                    }
                }
                
                # 构建时间
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                collection.load()
                
                # 查询时间
                query_vector = [[np.random.random() for _ in range(128)]]
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 128}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 内存估算（每个节点约M*2条边）
                memory_per_vector = 128 * 4 + m * 2 * 8  # 向量 + 边
                total_memory_mb = data_size * memory_per_vector / 1024 / 1024
                
                print(f"{m:4d} {build_time:10.2f}s {avg_time:10.2f}ms {total_memory_mb:10.2f}MB")
                
                collection.release()
                collection.drop_index()
            
            print("\nM参数选择建议:")
            print("  M=4-8:   低内存，适合大规模数据")
            print("  M=16:    平衡选择（推荐）")
            print("  M=32-64: 高精度，内存占用高")
            ---
    b.搜索过程
        a.功能说明
            HNSW搜索从顶层入口节点开始，使用贪心策略找到当前层的局部最优节点。然后进入下一层，以上层的最优节点为起点继续搜索。在底层进行精细搜索，维护一个候选集合。ef参数控制搜索宽度，ef越大搜索越全面但速度越慢。ef必须大于等于limit（返回结果数）。推荐ef=64-512，根据精度要求调整。HNSW的查询时间是对数级别，性能优秀。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("hnsw_demo")
            
            # 创建HNSW索引
            index_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,
                    "efConstruction": 200
                }
            }
            
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同ef值
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 获取FLAT基准
            collection.release()
            collection.drop_index()
            
            flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 恢复HNSW
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试ef参数
            print("HNSW ef参数影响:\n")
            print(f"{'ef':>6s} {'查询时间':>12s} {'召回率':>10s}")
            print("-" * 32)
            
            ef_values = [10, 32, 64, 128, 256, 512]
            
            for ef in ef_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": ef}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                hnsw_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & hnsw_ids) / len(flat_ids)
                
                print(f"{ef:6d} {avg_time:10.2f}ms {recall*100:9.2f}%")
            
            # 搜索过程可视化（概念）
            print("\nHNSW搜索过程:")
            print("  1. 从顶层入口节点开始")
            print("  2. 在当前层贪心搜索局部最优")
            print("  3. 进入下一层，以上层最优为起点")
            print("  4. 重复直到底层")
            print("  5. 在底层维护ef大小的候选集")
            print("  6. 返回Top-K结果")
            
            # ef选择建议
            def recommend_ef(target_recall=0.95):
                """推荐ef值"""
                for ef in [32, 64, 128, 256, 512]:
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": ef}
                    }
                    
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    
                    hnsw_ids = set([hit.id for hit in results[0]])
                    recall = len(flat_ids & hnsw_ids) / len(flat_ids)
                    
                    if recall >= target_recall:
                        return ef, recall
                
                return 512, recall
            
            recommended_ef, recall = recommend_ef(0.95)
            print(f"\n推荐ef值（召回率≥95%）: {recommended_ef}")
            print(f"实际召回率: {recall*100:.2f}%")
            ---

02.性能优化
    a.构建优化
        a.功能说明
            HNSW构建时间较长，是其主要缺点。efConstruction参数控制构建质量，值越大构建越慢但图质量越高。推荐efConstruction=100-500，通常设置为200。构建过程可以并行化，利用多核CPU。增量构建性能较差，建议批量构建。构建完成后索引不可修改，新数据需要重建索引。可以通过预训练减少构建时间。构建时内存占用较高，需要充足内存。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 测试不同efConstruction值
            ef_construction_values = [100, 200, 400]
            
            print("efConstruction参数影响:\n")
            print(f"{'efConstruction':>16s} {'构建时间':>12s} {'查询时间':>12s} {'召回率':>10s}")
            print("-" * 55)
            
            for ef_const in ef_construction_values:
                # 创建索引
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": 16,
                        "efConstruction": ef_const
                    }
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                collection.load()
                
                # 测试查询性能
                query_vector = [[np.random.random() for _ in range(128)]]
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 128}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 计算召回率（需要FLAT基准）
                # 这里简化，实际应该与FLAT对比
                recall = 0.98  # 示例值
                
                print(f"{ef_const:16d} {build_time:10.2f}s {avg_time:10.2f}ms {recall*100:9.2f}%")
                
                collection.release()
                collection.drop_index()
            
            print("\nefConstruction选择建议:")
            print("  100-200: 快速构建，适合原型开发")
            print("  200-400: 平衡选择（推荐）")
            print("  400+:    高质量图，构建时间长")
            
            # 批量构建策略
            def batch_build_hnsw(data_batches):
                """批量构建HNSW索引"""
                # 先插入所有数据
                for batch in data_batches:
                    collection.insert(batch)
                
                collection.flush()
                
                # 一次性构建索引
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": 16,
                        "efConstruction": 200
                    }
                }
                
                print("开始批量构建HNSW索引...")
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                print(f"构建完成，耗时: {build_time:.2f}s")
            
            # 增量构建问题
            print("\n增量构建注意事项:")
            print("  - HNSW不支持高效增量构建")
            print("  - 新数据需要重建整个索引")
            print("  - 建议批量插入后统一构建")
            print("  - 或使用IVF系列索引（支持增量）")
            ---
    b.查询优化
        a.功能说明
            HNSW查询性能优秀，是其主要优势。查询时间与数据量呈对数关系，扩展性好。批量查询可以提升吞吐量，共享图遍历开销。ef参数是查询性能的关键，建议根据延迟要求动态调整。可以为不同查询场景设置不同ef值。HNSW对CPU友好，可以利用多核并行查询。内存访问模式较好，缓存命中率高。适合低延迟、高吞吐的查询场景。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import concurrent.futures
            
            collection = Collection("documents")
            
            # 创建HNSW索引
            index_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,
                    "efConstruction": 200
                }
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 单次查询性能
            def test_single_query():
                """测试单次查询性能"""
                query_vector = [[np.random.random() for _ in range(128)]]
                
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 128}
                }
                
                times = []
                for _ in range(100):
                    start = time.time()
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                p50 = np.percentile(times, 50) * 1000
                p95 = np.percentile(times, 95) * 1000
                p99 = np.percentile(times, 99) * 1000
                
                print("单次查询性能:")
                print(f"  平均: {avg_time:.2f}ms")
                print(f"  P50:  {p50:.2f}ms")
                print(f"  P95:  {p95:.2f}ms")
                print(f"  P99:  {p99:.2f}ms")
            
            test_single_query()
            
            # 批量查询性能
            def test_batch_query():
                """测试批量查询性能"""
                batch_sizes = [1, 10, 50, 100]
                
                print("\n批量查询性能:")
                print(f"{'批量大小':>8s} {'总时间':>10s} {'平均每次':>12s} {'QPS':>10s}")
                print("-" * 45)
                
                for batch_size in batch_sizes:
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                    
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": 128}
                    }
                    
                    start = time.time()
                    collection.search(
                        data=query_vectors,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    total_time = time.time() - start
                    
                    avg_time = total_time / batch_size * 1000
                    qps = batch_size / total_time
                    
                    print(f"{batch_size:8d} {total_time*1000:9.2f}ms {avg_time:10.2f}ms {qps:9.2f}")
            
            test_batch_query()
            
            # 并发查询性能
            def test_concurrent_query():
                """测试并发查询性能"""
                def single_query():
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": 128}
                    }
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                
                print("\n并发查询性能:")
                print(f"{'并发数':>8s} {'总时间':>10s} {'QPS':>10s}")
                print("-" * 32)
                
                for num_workers in [1, 2, 4, 8, 16]:
                    num_queries = 100
                    
                    start = time.time()
                    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
                        futures = [executor.submit(single_query) for _ in range(num_queries)]
                        for future in concurrent.futures.as_completed(futures):
                            future.result()
                    
                    total_time = time.time() - start
                    qps = num_queries / total_time
                    
                    print(f"{num_workers:8d} {total_time:9.2f}s {qps:9.2f}")
            
            test_concurrent_query()
            
            # 动态ef调整
            class AdaptiveHNSWSearch:
                def __init__(self, collection):
                    self.collection = collection
                    self.ef_map = {
                        "fast": 64,
                        "balanced": 128,
                        "accurate": 256
                    }
                
                def search(self, query_vector, mode="balanced", limit=10):
                    """根据模式动态调整ef"""
                    ef = self.ef_map.get(mode, 128)
                    
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": ef}
                    }
                    
                    return self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit
                    )
            
            adaptive_search = AdaptiveHNSWSearch(collection)
            
            # 不同模式的查询
            query_vector = [np.random.random() for _ in range(128)]
            
            print("\n自适应查询:")
            for mode in ["fast", "balanced", "accurate"]:
                start = time.time()
                results = adaptive_search.search(query_vector, mode=mode)
                elapsed = time.time() - start
                print(f"  {mode:10s}: {elapsed*1000:.2f}ms")
            ---

03.使用建议
    a.适用场景
        a.功能说明
            HNSW适合对查询延迟要求高的场景，如实时推荐、在线搜索等。适合数据量大但更新频率低的应用。内存充足时HNSW是最佳选择。不适合频繁更新的场景，因为不支持高效增量构建。适合CPU密集型查询，GPU加速效果不明显。适合高维向量（512维以上），性能优势更明显。推荐作为生产环境的首选索引。
        b.代码示例
            ---
            from pymilvus import Collection
            
            # 场景1: 实时推荐系统
            def realtime_recommendation():
                """实时推荐场景"""
                collection = Collection("product_embeddings")
                
                # HNSW配置（低延迟）
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "IP",  # 内积，适合推荐
                    "params": {
                        "M": 16,
                        "efConstruction": 200
                    }
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 快速查询（ef=64）
                user_vector = [[0.1] * 128]
                search_params = {
                    "metric_type": "IP",
                    "params": {"ef": 64}
                }
                
                results = collection.search(
                    data=user_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=20,
                    output_fields=["id", "title"]
                )
                
                print("推荐商品:")
                for hit in results[0]:
                    print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 场景2: 图像搜索
            def image_search():
                """图像搜索场景"""
                collection = Collection("image_vectors")
                
                # HNSW配置（高维向量）
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": 32,  # 高维向量用更大的M
                        "efConstruction": 400
                    }
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 精确查询（ef=256）
                query_image_vector = [[0.1] * 512]  # 512维
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 256}
                }
                
                results = collection.search(
                    data=query_image_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("相似图像:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 场景3: 文本语义搜索
            def semantic_search():
                """文本语义搜索"""
                collection = Collection("document_embeddings")
                
                # HNSW配置（平衡）
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "COSINE",
                    "params": {
                        "M": 16,
                        "efConstruction": 200
                    }
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 语义查询
                query_text_vector = [[0.1] * 768]  # BERT embedding
                search_params = {
                    "metric_type": "COSINE",
                    "params": {"ef": 128}
                }
                
                results = collection.search(
                    data=query_text_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10,
                    output_fields=["title", "content"]
                )
                
                print("相关文档:")
                for hit in results[0]:
                    print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            realtime_recommendation()
            image_search()
            semantic_search()
            ---
    b.对比总结
        a.功能说明
            HNSW vs IVF：HNSW查询更快但内存更高，IVF内存更低但查询较慢。HNSW构建慢，IVF构建快。HNSW不支持增量，IVF支持。HNSW适合静态数据，IVF适合动态数据。HNSW vs FLAT：HNSW是近似索引，FLAT是精确索引。HNSW性能远超FLAT，但召回率略低。选择建议：低延迟用HNSW，低内存用IVF，高召回用FLAT。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 性能对比测试
            def compare_indexes():
                """对比不同索引的性能"""
                indexes = [
                    ("FLAT", {"index_type": "FLAT", "metric_type": "L2", "params": {}}, {"metric_type": "L2"}),
                    ("IVF_FLAT", {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 1024}}, {"metric_type": "L2", "params": {"nprobe": 16}}),
                    ("HNSW", {"index_type": "HNSW", "metric_type": "L2", "params": {"M": 16, "efConstruction": 200}}, {"metric_type": "L2", "params": {"ef": 128}})
                ]
                
                print("索引性能对比:\n")
                print(f"{'索引类型':>12s} {'构建时间':>12s} {'查询时间':>12s} {'内存占用':>12s}")
                print("-" * 52)
                
                query_vector = [[np.random.random() for _ in range(128)]]
                
                for index_name, index_params, search_params in indexes:
                    # 构建索引
                    start = time.time()
                    collection.create_index(field_name="embedding", index_params=index_params)
                    build_time = time.time() - start
                    
                    collection.load()
                    
                    # 查询性能
                    times = []
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param=search_params,
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000
                    
                    # 内存估算
                    num_vectors = collection.num_entities
                    dim = 128
                    
                    if index_name == "FLAT":
                        memory_mb = num_vectors * dim * 4 / 1024 / 1024
                    elif index_name == "IVF_FLAT":
                        memory_mb = num_vectors * dim * 4 / 1024 / 1024
                    else:  # HNSW
                        memory_mb = num_vectors * (dim * 4 + 16 * 2 * 8) / 1024 / 1024
                    
                    print(f"{index_name:>12s} {build_time:10.2f}s {avg_time:10.2f}ms {memory_mb:10.2f}MB")
                    
                    collection.release()
                    collection.drop_index()
                
                print("\n选择建议:")
                print("  FLAT:     数据量<10万，需要100%召回率")
                print("  IVF_FLAT: 数据量10万-1000万，内存受限")
                print("  HNSW:     数据量>10万，低延迟要求，内存充足")
            
            compare_indexes()
            
            # 决策树
            def recommend_index(num_vectors, memory_limit_gb, latency_requirement_ms, update_frequency):
                """推荐索引类型"""
                print("\n索引推荐决策:")
                print(f"  数据量: {num_vectors:,}")
                print(f"  内存限制: {memory_limit_gb}GB")
                print(f"  延迟要求: {latency_requirement_ms}ms")
                print(f"  更新频率: {update_frequency}")
                
                if num_vectors < 100000:
                    return "FLAT"
                
                dim = 128
                hnsw_memory_gb = num_vectors * (dim * 4 + 16 * 2 * 8) / 1024 / 1024 / 1024
                
                if hnsw_memory_gb <= memory_limit_gb and latency_requirement_ms < 10:
                    if update_frequency == "low":
                        return "HNSW"
                    else:
                        return "IVF_FLAT (HNSW不支持高频更新)"
                else:
                    return "IVF_FLAT"
            
            recommendation = recommend_index(
                num_vectors=1000000,
                memory_limit_gb=4,
                latency_requirement_ms=5,
                update_frequency="low"
            )
            
            print(f"\n推荐索引: {recommendation}")
            ---

5.5 标量索引

01.标量索引类型
    a.INVERTED索引
        a.功能说明
            倒排索引适用于VARCHAR和数值类型字段的等值查询和范围查询。通过建立值到文档ID的映射，加速标量字段的过滤。适合高基数字段（唯一值多的字段），如用户ID、商品ID等。对于低基数字段（如性别、类别）效果不明显。可以与向量索引配合使用，实现混合查询。标量索引占用内存较小，构建速度快。支持字符串前缀匹配和数值范围查询。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建带标量字段的Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),
                FieldSchema(name="price", dtype=DataType.FLOAT),
                FieldSchema(name="timestamp", dtype=DataType.INT64),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("scalar_index_demo", schema=schema)
            
            # 插入测试数据
            data_size = 100000
            ids = list(range(data_size))
            titles = [f"商品{i}" for i in range(data_size)]
            categories = ["电子", "服装", "食品", "图书"] * (data_size // 4)
            prices = [np.random.uniform(10, 1000) for _ in range(data_size)]
            timestamps = [1700000000 + i for i in range(data_size)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            
            data = [ids, titles, categories, prices, timestamps, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 创建标量索引
            collection.create_index(
                field_name="category",
                index_name="category_index"
            )
            
            collection.create_index(
                field_name="price",
                index_name="price_index"
            )
            
            collection.create_index(
                field_name="timestamp",
                index_name="timestamp_index"
            )
            
            print("标量索引创建完成")
            
            # 创建向量索引
            vector_index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=vector_index_params)
            
            collection.load()
            
            # 测试标量过滤性能
            expr = 'category == "电子" and price > 500'
            
            start = time.time()
            results = collection.query(
                expr=expr,
                output_fields=["id", "title", "category", "price"],
                limit=100
            )
            elapsed = time.time() - start
            
            print(f"\n标量查询: {len(results)} 条结果，耗时 {elapsed*1000:.2f}ms")
            
            # 混合查询（向量+标量）
            query_vector = [[np.random.random() for _ in range(128)]]
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                expr='category == "电子" and price > 500',
                output_fields=["id", "title", "category", "price"]
            )
            elapsed = time.time() - start
            
            print(f"混合查询: {len(results[0])} 条结果，耗时 {elapsed*1000:.2f}ms")
            ---
    b.AUTO_INDEX
        a.功能说明
            AUTO_INDEX是Milvus自动选择的标量索引类型，根据字段类型和数据特征自动优化。简化索引创建流程，无需手动指定索引类型。适合不确定最佳索引类型的场景。对于大多数标量字段都能提供良好性能。推荐作为标量索引的默认选择。内部可能使用B树、哈希表等多种数据结构。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 使用AUTO_INDEX
            collection.create_index(
                field_name="category",
                index_params={"index_type": "AUTO_INDEX"}
            )
            
            collection.create_index(
                field_name="timestamp",
                index_params={"index_type": "AUTO_INDEX"}
            )
            
            print("AUTO_INDEX创建完成")
            
            collection.load()
            
            # 测试查询
            results = collection.query(
                expr='category == "技术" and timestamp > 1700000000',
                output_fields=["id", "title"],
                limit=100
            )
            
            print(f"查询结果: {len(results)} 条")
            
            # AUTO_INDEX建议
            print("\nAUTO_INDEX使用建议:")
            print("  优点: 自动优化，无需调参")
            print("  缺点: 缺乏控制，可能不是最优")
            print("  适用: 快速开发，不确定最佳索引类型")
            ---

02.标量过滤优化
    a.过滤表达式
        a.功能说明
            标量过滤表达式支持等值、范围、逻辑运算等操作。合理使用索引可以显著提升过滤性能。过滤条件应该尽量使用索引字段。复杂表达式可能无法充分利用索引。建议将高选择性条件放在前面。过滤后的结果集越小，向量搜索越快。标量过滤在向量搜索前执行，可以减少向量计算量。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 测试不同过滤条件的性能
            test_cases = [
                ('category == "技术"', "单条件等值"),
                ('price > 100 and price < 500', "范围查询"),
                ('category == "技术" and price > 100', "组合条件"),
                ('category in ["技术", "新闻", "博客"]', "IN查询"),
                ('category == "技术" or category == "新闻"', "OR条件")
            ]
            
            print("过滤表达式性能测试:\n")
            
            for expr, desc in test_cases:
                start = time.time()
                results = collection.query(
                    expr=expr,
                    output_fields=["id"],
                    limit=1000
                )
                elapsed = time.time() - start
                
                print(f"{desc:15s}: {len(results):5d} 条结果, {elapsed*1000:6.2f}ms")
            
            # 混合查询优化
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 策略1: 宽松过滤（过滤后数据多）
            expr_loose = 'category == "技术"'
            
            start = time.time()
            results_loose = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                expr=expr_loose
            )
            time_loose = time.time() - start
            
            # 策略2: 严格过滤（过滤后数据少）
            expr_strict = 'category == "技术" and price > 500 and timestamp > 1700000000'
            
            start = time.time()
            results_strict = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                expr=expr_strict
            )
            time_strict = time.time() - start
            
            print(f"\n混合查询优化:")
            print(f"  宽松过滤: {time_loose*1000:.2f}ms")
            print(f"  严格过滤: {time_strict*1000:.2f}ms")
            print(f"  建议: 过滤条件越严格，向量搜索越快")
            
            # 表达式优化建议
            print("\n表达式优化建议:")
            print("  1. 使用索引字段")
            print("  2. 高选择性条件在前")
            print("  3. 避免复杂嵌套")
            print("  4. 使用IN代替多个OR")
            print("  5. 范围查询使用索引")
            ---
    b.索引选择
        a.功能说明
            不是所有标量字段都需要索引。高基数字段（唯一值多）适合建索引，如ID、邮箱等。低基数字段（唯一值少）索引效果不明显，如性别、状态等。频繁查询的字段应该建索引。索引会增加内存占用和插入开销。需要在查询性能和资源消耗间平衡。可以通过查询分析确定需要索引的字段。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 分析字段基数
            def analyze_cardinality(collection, field_name):
                """分析字段的基数（唯一值数量）"""
                # 查询所有数据
                results = collection.query(
                    expr="id >= 0",
                    output_fields=[field_name],
                    limit=16384
                )
                
                # 统计唯一值
                unique_values = set([r[field_name] for r in results])
                cardinality = len(unique_values)
                total_count = len(results)
                
                cardinality_ratio = cardinality / total_count if total_count > 0 else 0
                
                return {
                    "field": field_name,
                    "total": total_count,
                    "unique": cardinality,
                    "ratio": cardinality_ratio
                }
            
            # 分析多个字段
            fields_to_analyze = ["category", "timestamp", "id"]
            
            print("字段基数分析:\n")
            print(f"{'字段':>12s} {'总数':>8s} {'唯一值':>8s} {'基数比':>8s} {'建议':>12s}")
            print("-" * 55)
            
            for field in fields_to_analyze:
                stats = analyze_cardinality(collection, field)
                
                # 索引建议
                if stats["ratio"] > 0.5:
                    recommendation = "建议索引"
                elif stats["ratio"] > 0.1:
                    recommendation = "可选索引"
                else:
                    recommendation = "不建议"
                
                print(f"{stats['field']:>12s} {stats['total']:>8d} {stats['unique']:>8d} {stats['ratio']:>8.2%} {recommendation:>12s}")
            
            # 索引决策树
            def should_create_index(field_name, cardinality_ratio, query_frequency):
                """决定是否创建索引"""
                if cardinality_ratio > 0.5 and query_frequency == "high":
                    return True, "高基数+高频查询"
                elif cardinality_ratio > 0.1 and query_frequency == "high":
                    return True, "中基数+高频查询"
                elif cardinality_ratio > 0.5 and query_frequency == "medium":
                    return True, "高基数+中频查询"
                else:
                    return False, "不建议索引"
            
            # 示例决策
            decisions = [
                ("user_id", 0.9, "high"),
                ("category", 0.01, "high"),
                ("timestamp", 0.8, "medium"),
                ("status", 0.001, "low")
            ]
            
            print("\n索引决策示例:")
            for field, ratio, freq in decisions:
                should_index, reason = should_create_index(field, ratio, freq)
                print(f"  {field:12s}: {'创建' if should_index else '跳过':4s} ({reason})")
            
            # 索引成本分析
            print("\n索引成本分析:")
            print("  内存成本: 每个索引约占原字段大小的10%-50%")
            print("  插入成本: 索引字段插入速度降低10%-30%")
            print("  查询收益: 索引查询速度提升10x-100x")
            print("  建议: 只为高频查询的高基数字段建索引")
            ---

5.6 索引参数

01.参数配置
    a.构建参数
        a.功能说明
            索引构建参数决定索引的质量和构建时间。不同索引类型有不同的构建参数。IVF系列的nlist控制聚类数量，HNSW的M和efConstruction控制图结构。构建参数一旦设置无法修改，需要重建索引。应该根据数据规模和性能要求选择参数。可以通过小规模测试确定最优参数。构建参数影响索引大小和查询性能。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("index_params_test", schema=schema)
            
            # 插入数据
            data_size = 100000
            ids = list(range(data_size))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # IVF_FLAT参数配置
            ivf_configs = [
                {"nlist": 512},
                {"nlist": 1024},
                {"nlist": 2048}
            ]
            
            print("IVF_FLAT构建参数测试:\n")
            print(f"{'nlist':>8s} {'构建时间':>12s} {'索引大小':>12s}")
            print("-" * 36)
            
            for params in ivf_configs:
                index_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": params
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                # 估算索引大小
                index_size_mb = data_size * 128 * 4 / 1024 / 1024
                
                print(f"{params['nlist']:8d} {build_time:10.2f}s {index_size_mb:10.2f}MB")
                
                collection.drop_index()
            
            # HNSW参数配置
            hnsw_configs = [
                {"M": 8, "efConstruction": 100},
                {"M": 16, "efConstruction": 200},
                {"M": 32, "efConstruction": 400}
            ]
            
            print("\nHNSW构建参数测试:\n")
            print(f"{'M':>4s} {'efConstruction':>16s} {'构建时间':>12s}")
            print("-" * 36)
            
            for params in hnsw_configs:
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": params
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                print(f"{params['M']:4d} {params['efConstruction']:16d} {build_time:10.2f}s")
                
                collection.drop_index()
            
            # 参数推荐函数
            def recommend_build_params(num_vectors, index_type):
                """推荐构建参数"""
                if index_type == "IVF_FLAT":
                    sqrt_n = int(np.sqrt(num_vectors))
                    return {
                        "conservative": {"nlist": sqrt_n},
                        "balanced": {"nlist": 2 * sqrt_n},
                        "aggressive": {"nlist": 4 * sqrt_n}
                    }
                elif index_type == "HNSW":
                    return {
                        "fast_build": {"M": 8, "efConstruction": 100},
                        "balanced": {"M": 16, "efConstruction": 200},
                        "high_quality": {"M": 32, "efConstruction": 400}
                    }
                else:
                    return {}
            
            print(f"\n推荐参数（{data_size:,}个向量）:")
            
            for index_type in ["IVF_FLAT", "HNSW"]:
                print(f"\n{index_type}:")
                recs = recommend_build_params(data_size, index_type)
                for strategy, params in recs.items():
                    print(f"  {strategy:15s}: {params}")
            ---
    b.搜索参数
        a.功能说明
            搜索参数控制查询时的性能和召回率平衡。可以在运行时动态调整，无需重建索引。IVF的nprobe控制搜索的聚类数量，HNSW的ef控制搜索宽度。搜索参数越大召回率越高但性能越低。应该根据应用场景选择合适的搜索参数。可以为不同查询设置不同参数。建议通过A/B测试确定最优搜索参数。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("index_params_test")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同搜索参数
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 获取FLAT基准
            collection.release()
            collection.drop_index()
            
            flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 恢复IVF索引
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 搜索参数测试
            print("IVF搜索参数测试:\n")
            print(f"{'nprobe':>8s} {'查询时间':>12s} {'召回率':>10s} {'QPS':>10s}")
            print("-" * 45)
            
            nprobe_values = [1, 4, 8, 16, 32, 64]
            
            for nprobe in nprobe_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                # 测量性能
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                qps = 1000 / avg_time if avg_time > 0 else 0
                
                # 计算召回率
                ivf_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                
                print(f"{nprobe:8d} {avg_time:10.2f}ms {recall*100:9.2f}% {qps:9.2f}")
            
            # 动态参数调整
            class DynamicSearchParams:
                def __init__(self):
                    self.params_map = {
                        "fast": {"nprobe": 4},
                        "balanced": {"nprobe": 16},
                        "accurate": {"nprobe": 64}
                    }
                
                def get_params(self, mode="balanced"):
                    """根据模式获取搜索参数"""
                    return {
                        "metric_type": "L2",
                        "params": self.params_map.get(mode, self.params_map["balanced"])
                    }
                
                def auto_adjust(self, latency_ms, target_latency_ms=10):
                    """根据延迟自动调整参数"""
                    if latency_ms > target_latency_ms * 1.5:
                        return "fast"
                    elif latency_ms < target_latency_ms * 0.5:
                        return "accurate"
                    else:
                        return "balanced"
            
            dynamic_params = DynamicSearchParams()
            
            # 自适应查询
            print("\n自适应搜索参数:")
            
            for mode in ["fast", "balanced", "accurate"]:
                params = dynamic_params.get_params(mode)
                
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=params,
                    limit=10
                )
                latency = (time.time() - start) * 1000
                
                print(f"  {mode:10s}: {latency:.2f}ms (nprobe={params['params']['nprobe']})")
            ---

02.参数调优
    a.性能测试
        a.功能说明
            参数调优需要通过性能测试确定最优配置。测试应该覆盖不同数据规模和查询模式。关注指标包括构建时间、查询延迟、召回率、内存占用等。应该在真实数据和查询上测试，避免过拟合。可以使用网格搜索或贝叶斯优化寻找最优参数。需要在多个指标间权衡，没有绝对最优解。建议建立参数调优流程和工具。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from itertools import product
            
            collection = Collection("documents")
            
            # 网格搜索最优参数
            def grid_search_ivf_params(collection, query_vectors, target_recall=0.95):
                """网格搜索IVF最优参数"""
                # 参数网格
                nlist_values = [512, 1024, 2048]
                nprobe_values = [8, 16, 32, 64]
                
                # 获取FLAT基准
                collection.release()
                collection.drop_index()
                
                flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
                collection.create_index(field_name="embedding", index_params=flat_params)
                collection.load()
                
                flat_results_list = []
                for qv in query_vectors:
                    results = collection.search(
                        data=[qv],
                        anns_field="embedding",
                        param={"metric_type": "L2"},
                        limit=100
                    )
                    flat_results_list.append(set([hit.id for hit in results[0]]))
                
                # 测试所有参数组合
                best_config = None
                best_score = float('inf')
                
                results_table = []
                
                for nlist in nlist_values:
                    # 构建索引
                    collection.release()
                    collection.drop_index()
                    
                    index_params = {
                        "index_type": "IVF_FLAT",
                        "metric_type": "L2",
                        "params": {"nlist": nlist}
                    }
                    
                    start = time.time()
                    collection.create_index(field_name="embedding", index_params=index_params)
                    build_time = time.time() - start
                    
                    collection.load()
                    
                    for nprobe in nprobe_values:
                        search_params = {
                            "metric_type": "L2",
                            "params": {"nprobe": nprobe}
                        }
                        
                        # 测试查询
                        total_time = 0
                        total_recall = 0
                        
                        for i, qv in enumerate(query_vectors):
                            start = time.time()
                            results = collection.search(
                                data=[qv],
                                anns_field="embedding",
                                param=search_params,
                                limit=100
                            )
                            total_time += time.time() - start
                            
                            ivf_ids = set([hit.id for hit in results[0]])
                            recall = len(flat_results_list[i] & ivf_ids) / len(flat_results_list[i])
                            total_recall += recall
                        
                        avg_time = total_time / len(query_vectors) * 1000
                        avg_recall = total_recall / len(query_vectors)
                        
                        # 评分：满足召回率要求的最快配置
                        if avg_recall >= target_recall:
                            score = avg_time
                            if score < best_score:
                                best_score = score
                                best_config = {
                                    "nlist": nlist,
                                    "nprobe": nprobe,
                                    "build_time": build_time,
                                    "query_time": avg_time,
                                    "recall": avg_recall
                                }
                        
                        results_table.append({
                            "nlist": nlist,
                            "nprobe": nprobe,
                            "build_time": build_time,
                            "query_time": avg_time,
                            "recall": avg_recall
                        })
                
                # 打印结果
                print("参数网格搜索结果:\n")
                print(f"{'nlist':>8s} {'nprobe':>8s} {'构建时间':>12s} {'查询时间':>12s} {'召回率':>10s}")
                print("-" * 55)
                
                for r in results_table:
                    print(f"{r['nlist']:8d} {r['nprobe']:8d} {r['build_time']:10.2f}s {r['query_time']:10.2f}ms {r['recall']*100:9.2f}%")
                
                if best_config:
                    print(f"\n最优配置（召回率≥{target_recall*100:.0f}%）:")
                    print(f"  nlist: {best_config['nlist']}")
                    print(f"  nprobe: {best_config['nprobe']}")
                    print(f"  查询时间: {best_config['query_time']:.2f}ms")
                    print(f"  召回率: {best_config['recall']*100:.2f}%")
                
                return best_config
            
            # 生成测试查询
            test_queries = [[np.random.random() for _ in range(128)] for _ in range(10)]
            
            # 执行网格搜索
            best_config = grid_search_ivf_params(collection, test_queries, target_recall=0.95)
            ---
    b.调优策略
        a.功能说明
            参数调优应该遵循系统化策略。首先确定性能目标（延迟、召回率、吞吐量等）。然后选择合适的索引类型。接着通过测试确定构建参数。最后调整搜索参数达到目标性能。应该在真实负载下测试，考虑并发查询。需要监控生产环境性能，持续优化。建议建立参数配置管理系统。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            # 参数调优流程
            class IndexTuner:
                def __init__(self, collection):
                    self.collection = collection
                    self.test_queries = [[np.random.random() for _ in range(128)] for _ in range(20)]
                
                def step1_select_index_type(self, num_vectors, memory_limit_gb, latency_requirement_ms):
                    """步骤1: 选择索引类型"""
                    print("步骤1: 选择索引类型\n")
                    
                    if num_vectors < 100000:
                        recommendation = "FLAT"
                        reason = "数据量小，使用精确索引"
                    else:
                        dim = 128
                        hnsw_memory = num_vectors * (dim * 4 + 16 * 2 * 8) / 1024 / 1024 / 1024
                        
                        if hnsw_memory <= memory_limit_gb and latency_requirement_ms < 10:
                            recommendation = "HNSW"
                            reason = "低延迟要求，内存充足"
                        else:
                            recommendation = "IVF_FLAT"
                            reason = "平衡性能和内存"
                    
                    print(f"推荐索引: {recommendation}")
                    print(f"原因: {reason}\n")
                    
                    return recommendation
                
                def step2_tune_build_params(self, index_type):
                    """步骤2: 调优构建参数"""
                    print("步骤2: 调优构建参数\n")
                    
                    num_vectors = self.collection.num_entities
                    
                    if index_type == "IVF_FLAT":
                        sqrt_n = int(np.sqrt(num_vectors))
                        candidates = [sqrt_n, 2*sqrt_n, 4*sqrt_n]
                        
                        print(f"测试nlist值: {candidates}")
                        
                        best_nlist = 2 * sqrt_n  # 简化，实际应测试
                        build_params = {"nlist": best_nlist}
                        
                    elif index_type == "HNSW":
                        candidates = [
                            {"M": 8, "efConstruction": 100},
                            {"M": 16, "efConstruction": 200},
                            {"M": 32, "efConstruction": 400}
                        ]
                        
                        print(f"测试M和efConstruction组合")
                        
                        build_params = {"M": 16, "efConstruction": 200}  # 简化
                    
                    else:
                        build_params = {}
                    
                    print(f"选择构建参数: {build_params}\n")
                    
                    return build_params
                
                def step3_tune_search_params(self, index_type, target_recall=0.95, target_latency_ms=10):
                    """步骤3: 调优搜索参数"""
                    print("步骤3: 调优搜索参数\n")
                    print(f"目标召回率: {target_recall*100:.0f}%")
                    print(f"目标延迟: {target_latency_ms}ms\n")
                    
                    if index_type == "IVF_FLAT":
                        # 二分查找最优nprobe
                        left, right = 1, 128
                        best_nprobe = 16
                        
                        print(f"搜索最优nprobe...")
                        
                        search_params = {"nprobe": best_nprobe}
                        
                    elif index_type == "HNSW":
                        # 测试不同ef值
                        best_ef = 128
                        
                        print(f"搜索最优ef...")
                        
                        search_params = {"ef": best_ef}
                    
                    else:
                        search_params = {}
                    
                    print(f"选择搜索参数: {search_params}\n")
                    
                    return search_params
                
                def step4_validate(self, index_type, build_params, search_params):
                    """步骤4: 验证配置"""
                    print("步骤4: 验证配置\n")
                    
                    # 创建索引
                    index_params = {
                        "index_type": index_type,
                        "metric_type": "L2",
                        "params": build_params
                    }
                    
                    start = time.time()
                    self.collection.create_index(field_name="embedding", index_params=index_params)
                    build_time = time.time() - start
                    
                    self.collection.load()
                    
                    # 测试查询
                    full_search_params = {
                        "metric_type": "L2",
                        "params": search_params
                    }
                    
                    times = []
                    for qv in self.test_queries:
                        start = time.time()
                        self.collection.search(
                            data=[qv],
                            anns_field="embedding",
                            param=full_search_params,
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000
                    p95_time = np.percentile(times, 95) * 1000
                    
                    print(f"构建时间: {build_time:.2f}s")
                    print(f"平均查询时间: {avg_time:.2f}ms")
                    print(f"P95查询时间: {p95_time:.2f}ms")
                    
                    return {
                        "build_time": build_time,
                        "avg_latency": avg_time,
                        "p95_latency": p95_time
                    }
                
                def tune(self, num_vectors, memory_limit_gb, latency_requirement_ms, target_recall=0.95):
                    """完整调优流程"""
                    print("=" * 60)
                    print("索引参数调优流程")
                    print("=" * 60 + "\n")
                    
                    # 步骤1: 选择索引类型
                    index_type = self.step1_select_index_type(num_vectors, memory_limit_gb, latency_requirement_ms)
                    
                    # 步骤2: 调优构建参数
                    build_params = self.step2_tune_build_params(index_type)
                    
                    # 步骤3: 调优搜索参数
                    search_params = self.step3_tune_search_params(index_type, target_recall, latency_requirement_ms)
                    
                    # 步骤4: 验证配置
                    metrics = self.step4_validate(index_type, build_params, search_params)
                    
                    print("\n" + "=" * 60)
                    print("调优完成")
                    print("=" * 60)
                    
                    return {
                        "index_type": index_type,
                        "build_params": build_params,
                        "search_params": search_params,
                        "metrics": metrics
                    }
            
            # 使用调优器
            tuner = IndexTuner(collection)
            
            optimal_config = tuner.tune(
                num_vectors=100000,
                memory_limit_gb=4,
                latency_requirement_ms=10,
                target_recall=0.95
            )
            
            print(f"\n最优配置:")
            print(f"  索引类型: {optimal_config['index_type']}")
            print(f"  构建参数: {optimal_config['build_params']}")
            print(f"  搜索参数: {optimal_config['search_params']}")
            ---

6 搜索查询

6.1 相似度搜索

01.基本搜索
    a.向量搜索
        a.功能说明
            向量搜索是Milvus的核心功能，通过计算查询向量与数据库中向量的相似度返回Top-K结果。支持多种距离度量方式：L2（欧氏距离）、IP（内积）、COSINE（余弦相似度）。查询时需要指定anns_field（向量字段名）、limit（返回结果数）和搜索参数。可以同时返回标量字段，通过output_fields指定。搜索结果按相似度排序，距离值越小表示越相似（L2）或越大表示越相似（IP）。支持批量查询，一次提交多个查询向量。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            import numpy as np
            
            # 连接Milvus
            connections.connect(host="localhost", port="19530")
            
            # 获取Collection
            collection = Collection("documents")
            collection.load()
            
            # 单个向量搜索
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                output_fields=["id", "title", "content"]
            )
            
            print("搜索结果:")
            for hit in results[0]:
                print(f"  ID: {hit.id}")
                print(f"  标题: {hit.entity.get('title')}")
                print(f"  距离: {hit.distance:.4f}")
                print()
            
            # 批量向量搜索
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(5)]
            
            results = collection.search(
                data=query_vectors,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            
            print(f"批量搜索: {len(results)} 个查询")
            for i, hits in enumerate(results):
                print(f"\n查询 {i+1}:")
                for hit in hits[:3]:  # 只显示前3个结果
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 不同距离度量
            metrics = ["L2", "IP", "COSINE"]
            
            print("\n不同距离度量对比:")
            for metric in metrics:
                search_params = {
                    "metric_type": metric,
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5
                )
                
                print(f"\n{metric}:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            ---
    b.距离度量
        a.功能说明
            Milvus支持多种距离度量方式，适用于不同场景。L2（欧氏距离）适合一般向量搜索，值越小越相似。IP（内积）适合推荐系统，值越大越相似。COSINE（余弦相似度）适合文本语义搜索，归一化向量后与IP等价。JACCARD和HAMMING适合二值向量。选择合适的距离度量可以提升搜索效果。距离度量在创建索引时指定，搜索时必须使用相同度量。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # L2距离（欧氏距离）
            def l2_search(query_vector):
                """L2距离搜索，值越小越相似"""
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("L2距离搜索:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, L2距离: {hit.distance:.4f}")
                
                return results
            
            # IP距离（内积）
            def ip_search(query_vector):
                """内积搜索，值越大越相似"""
                search_params = {
                    "metric_type": "IP",
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("\nIP内积搜索:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 内积: {hit.distance:.4f}")
                
                return results
            
            # COSINE距离（余弦相似度）
            def cosine_search(query_vector):
                """余弦相似度搜索，值越大越相似"""
                # 归一化查询向量
                norm = np.linalg.norm(query_vector)
                normalized_vector = (query_vector / norm).tolist()
                
                search_params = {
                    "metric_type": "COSINE",
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=[normalized_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("\nCOSINE余弦相似度搜索:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 余弦相似度: {hit.distance:.4f}")
                
                return results
            
            # 测试不同距离度量
            query_vector = [np.random.random() for _ in range(128)]
            
            l2_results = l2_search(query_vector)
            ip_results = ip_search(query_vector)
            cosine_results = cosine_search(query_vector)
            
            # 距离度量选择建议
            print("\n距离度量选择建议:")
            print("  L2:     通用向量搜索，适合图像、音频等")
            print("  IP:     推荐系统，用户-物品匹配")
            print("  COSINE: 文本语义搜索，归一化向量")
            print("  JACCARD: 集合相似度，标签匹配")
            print("  HAMMING: 二值向量，哈希检索")
            
            # 距离转换
            def convert_distance(distance, from_metric, to_metric):
                """距离值转换"""
                if from_metric == "L2" and to_metric == "COSINE":
                    # L2 to COSINE (假设向量已归一化)
                    return 1 - distance / 2
                elif from_metric == "IP" and to_metric == "COSINE":
                    # IP to COSINE (假设向量已归一化)
                    return distance
                else:
                    return distance
            
            print("\n距离转换示例:")
            print(f"  L2距离 0.5 ≈ 余弦相似度 {convert_distance(0.5, 'L2', 'COSINE'):.4f}")
            ---

02.搜索参数
    a.limit参数
        a.功能说明
            limit参数控制返回结果的数量，即Top-K中的K值。limit必须大于0，推荐范围1-1000。limit越大查询时间越长，但增长不是线性的。对于分页场景，建议使用offset参数配合limit。limit不影响召回率，只影响返回结果数量。实际返回结果可能少于limit，当匹配结果不足时。建议根据业务需求设置合理的limit值。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 测试不同limit值的性能
            limit_values = [1, 10, 50, 100, 500, 1000]
            
            print("limit参数性能测试:\n")
            print(f"{'limit':>8s} {'查询时间':>12s} {'结果数':>8s}")
            print("-" * 32)
            
            for limit in limit_values:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=limit
                )
                elapsed = time.time() - start
                
                actual_count = len(results[0])
                
                print(f"{limit:8d} {elapsed*1000:10.2f}ms {actual_count:8d}")
            
            # 分页查询
            def paginated_search(query_vector, page_size=10, page_num=1):
                """分页查询"""
                offset = (page_num - 1) * page_size
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=page_size,
                    offset=offset,
                    output_fields=["id", "title"]
                )
                
                return results[0]
            
            # 获取第1页
            query_vector = [np.random.random() for _ in range(128)]
            
            print("\n分页查询示例:")
            for page in range(1, 4):
                results = paginated_search(query_vector, page_size=10, page_num=page)
                print(f"\n第{page}页:")
                for hit in results:
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # limit选择建议
            print("\nlimit选择建议:")
            print("  实时推荐: limit=10-20")
            print("  搜索结果: limit=20-50")
            print("  批量处理: limit=100-1000")
            print("  注意: limit过大会影响性能和内存")
            ---
    b.offset参数
        a.功能说明
            offset参数用于跳过前N个结果，实现分页查询。offset从0开始，offset=0表示不跳过。offset + limit不应超过16384（Milvus限制）。offset会影响查询性能，值越大性能越差。不推荐使用大offset进行深度分页。对于深度分页，建议使用游标或时间戳方式。offset在排序后应用，不影响召回过程。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 测试offset性能
            offset_values = [0, 10, 50, 100, 500, 1000]
            
            print("offset参数性能测试:\n")
            print(f"{'offset':>8s} {'查询时间':>12s}")
            print("-" * 24)
            
            for offset in offset_values:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10,
                    offset=offset
                )
                elapsed = time.time() - start
                
                print(f"{offset:8d} {elapsed*1000:10.2f}ms")
            
            # 分页实现
            class Paginator:
                def __init__(self, collection, query_vector, page_size=10):
                    self.collection = collection
                    self.query_vector = query_vector
                    self.page_size = page_size
                    self.search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                
                def get_page(self, page_num):
                    """获取指定页"""
                    if page_num < 1:
                        raise ValueError("page_num must be >= 1")
                    
                    offset = (page_num - 1) * self.page_size
                    
                    # 检查offset限制
                    if offset + self.page_size > 16384:
                        raise ValueError("offset + limit exceeds 16384")
                    
                    results = self.collection.search(
                        data=[self.query_vector],
                        anns_field="embedding",
                        param=self.search_params,
                        limit=self.page_size,
                        offset=offset,
                        output_fields=["id", "title"]
                    )
                    
                    return results[0]
                
                def iterate_pages(self, max_pages=10):
                    """迭代多页"""
                    for page_num in range(1, max_pages + 1):
                        try:
                            results = self.get_page(page_num)
                            if len(results) == 0:
                                break
                            yield page_num, results
                        except ValueError as e:
                            print(f"停止迭代: {e}")
                            break
            
            # 使用分页器
            query_vector = [np.random.random() for _ in range(128)]
            paginator = Paginator(collection, query_vector, page_size=10)
            
            print("\n分页迭代示例:")
            for page_num, results in paginator.iterate_pages(max_pages=3):
                print(f"\n第{page_num}页: {len(results)}条结果")
                for hit in results[:3]:  # 只显示前3条
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 深度分页替代方案
            print("\n深度分页替代方案:")
            print("  1. 使用游标（基于上次结果的最后ID）")
            print("  2. 使用时间戳范围过滤")
            print("  3. 限制最大页数（如只允许前100页）")
            print("  4. 使用Elasticsearch等专门的分页工具")
            ---

6.2 范围查询

01.范围搜索
    a.距离范围
        a.功能说明
            范围搜索返回距离在指定范围内的所有向量，而不是Top-K结果。通过radius参数指定最大距离，返回所有距离小于radius的向量。可选range_filter参数指定最小距离，实现距离区间查询。适合需要获取所有相似结果的场景，如查找所有相似商品。返回结果数量不固定，可能为0或很多。需要合理设置radius避免返回过多结果。范围搜索性能与返回结果数量相关。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 基本范围搜索
            search_params = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 0.5  # 最大距离
                }
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,  # 最大返回数量
                output_fields=["id", "title"]
            )
            
            print(f"范围搜索结果: {len(results[0])} 条")
            for hit in results[0][:10]:  # 只显示前10条
                print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 距离区间搜索
            search_params_range = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 1.0,        # 最大距离
                    "range_filter": 0.3   # 最小距离
                }
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_range,
                limit=1000,
                output_fields=["id", "title"]
            )
            
            print(f"\n距离区间 [0.3, 1.0] 搜索结果: {len(results[0])} 条")
            
            # 不同距离范围对比
            radius_values = [0.3, 0.5, 1.0, 2.0]
            
            print("\n不同距离范围对比:")
            print(f"{'radius':>8s} {'结果数':>8s}")
            print("-" * 20)
            
            for radius in radius_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": radius
                    }
                }
                
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10000
                )
                
                print(f"{radius:8.1f} {len(results[0]):8d}")
            
            # 范围搜索应用场景
            def find_similar_products(product_vector, max_distance=0.5):
                """查找所有相似商品"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": max_distance
                    }
                }
                
                results = collection.search(
                    data=[product_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=1000,
                    output_fields=["id", "title", "price"]
                )
                
                return results[0]
            
            product_vector = [np.random.random() for _ in range(128)]
            similar_products = find_similar_products(product_vector, max_distance=0.5)
            
            print(f"\n相似商品查找: {len(similar_products)} 个商品")
            ---
    b.范围过滤
        a.功能说明
            范围过滤结合标量字段的范围条件和向量范围搜索。可以同时指定距离范围和标量字段范围。通过expr参数指定标量过滤条件，支持数值范围、日期范围等。先执行标量过滤，再进行向量范围搜索，提升性能。适合复杂查询场景，如查找特定价格区间的相似商品。需要为范围查询字段创建索引。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 价格范围 + 向量范围
            search_params = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 0.8
                }
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,
                expr='price >= 100 and price <= 500',
                output_fields=["id", "title", "price"]
            )
            
            print(f"价格范围 [100, 500] + 向量范围: {len(results[0])} 条结果")
            for hit in results[0][:5]:
                print(f"  {hit.entity.get('title')}: ¥{hit.entity.get('price'):.2f}, 距离: {hit.distance:.4f}")
            
            # 时间范围 + 向量范围
            import time
            current_time = int(time.time())
            one_week_ago = current_time - 7 * 24 * 3600
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,
                expr=f'timestamp >= {one_week_ago} and timestamp <= {current_time}',
                output_fields=["id", "title", "timestamp"]
            )
            
            print(f"\n最近7天 + 向量范围: {len(results[0])} 条结果")
            
            # 多条件范围过滤
            complex_expr = '''
                category == "电子产品" and 
                price >= 100 and price <= 1000 and
                rating >= 4.0 and
                stock > 0
            '''
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,
                expr=complex_expr,
                output_fields=["id", "title", "price", "rating"]
            )
            
            print(f"\n多条件范围过滤: {len(results[0])} 条结果")
            
            # 范围查询优化
            def optimized_range_search(query_vector, price_min, price_max, max_distance):
                """优化的范围查询"""
                # 策略1: 先用严格的标量过滤减少候选集
                expr = f'price >= {price_min} and price <= {price_max}'
                
                # 策略2: 使用合理的radius避免返回过多结果
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": max_distance
                    }
                }
                
                # 策略3: 设置合理的limit上限
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=500,  # 限制最大返回数
                    expr=expr,
                    output_fields=["id", "title", "price"]
                )
                
                return results[0]
            
            results = optimized_range_search(
                query_vector=[np.random.random() for _ in range(128)],
                price_min=200,
                price_max=800,
                max_distance=0.6
            )
            
            print(f"\n优化范围查询: {len(results)} 条结果")
            ---

02.范围查询优化
    a.性能优化
        a.功能说明
            范围查询性能与返回结果数量密切相关。应该合理设置radius避免返回过多结果。使用标量过滤减少候选集，提升性能。为范围查询字段创建索引，加速过滤。考虑使用分页或流式返回大量结果。监控查询性能，调整参数。范围查询比Top-K查询慢，需要权衡。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 性能对比: Top-K vs 范围查询
            print("性能对比: Top-K vs 范围查询\n")
            
            # Top-K查询
            search_params_topk = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            results_topk = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_topk,
                limit=100
            )
            time_topk = time.time() - start
            
            print(f"Top-K查询 (limit=100):")
            print(f"  查询时间: {time_topk*1000:.2f}ms")
            print(f"  结果数: {len(results_topk[0])}")
            
            # 范围查询
            search_params_range = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 1.0
                }
            }
            
            start = time.time()
            results_range = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_range,
                limit=10000
            )
            time_range = time.time() - start
            
            print(f"\n范围查询 (radius=1.0):")
            print(f"  查询时间: {time_range*1000:.2f}ms")
            print(f"  结果数: {len(results_range[0])}")
            print(f"  性能比: {time_range/time_topk:.2f}x")
            
            # 优化策略1: 使用标量过滤
            start = time.time()
            results_filtered = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_range,
                limit=10000,
                expr='id % 10 == 0'  # 过滤90%数据
            )
            time_filtered = time.time() - start
            
            print(f"\n范围查询 + 标量过滤:")
            print(f"  查询时间: {time_filtered*1000:.2f}ms")
            print(f"  结果数: {len(results_filtered[0])}")
            print(f"  加速比: {time_range/time_filtered:.2f}x")
            
            # 优化策略2: 调整radius
            radius_values = [0.3, 0.5, 0.8, 1.0, 1.5]
            
            print("\n不同radius的性能:")
            print(f"{'radius':>8s} {'查询时间':>12s} {'结果数':>8s}")
            print("-" * 32)
            
            for radius in radius_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": radius
                    }
                }
                
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10000
                )
                elapsed = time.time() - start
                
                print(f"{radius:8.1f} {elapsed*1000:10.2f}ms {len(results[0]):8d}")
            
            # 优化策略3: 分批处理
            def batch_range_search(query_vector, radius, batch_size=1000):
                """分批处理范围查询结果"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": radius
                    }
                }
                
                offset = 0
                all_results = []
                
                while True:
                    results = collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=batch_size,
                        offset=offset
                    )
                    
                    if len(results[0]) == 0:
                        break
                    
                    all_results.extend(results[0])
                    offset += batch_size
                    
                    if offset >= 10000:  # 最大限制
                        break
                
                return all_results
            
            print("\n分批处理范围查询:")
            query_vec = [np.random.random() for _ in range(128)]
            batch_results = batch_range_search(query_vec, radius=0.8, batch_size=500)
            print(f"  总结果数: {len(batch_results)}")
            ---
    b.使用建议
        a.功能说明
            范围查询适合需要获取所有相似结果的场景。不适合对性能要求极高的实时查询。建议先用小数据集测试radius值。监控返回结果数量，避免过载。考虑使用Top-K查询替代范围查询。范围查询结合标量过滤效果更好。需要在召回率和性能间权衡。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 场景1: 查找所有相似文档
            def find_all_similar_docs(query_vector, similarity_threshold=0.7):
                """查找所有相似文档（适合离线分析）"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 32,  # 更高的nprobe提升召回
                        "radius": similarity_threshold
                    }
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=5000,
                    output_fields=["id", "title"]
                )
                
                print(f"找到 {len(results[0])} 个相似文档")
                return results[0]
            
            # 场景2: 去重检测
            def detect_duplicates(query_vector, duplicate_threshold=0.1):
                """检测重复文档（距离很小）"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": duplicate_threshold
                    }
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=100
                )
                
                duplicates = [hit for hit in results[0] if hit.distance < duplicate_threshold]
                
                print(f"检测到 {len(duplicates)} 个可能的重复")
                return duplicates
            
            # 场景3: 聚类分析
            def cluster_analysis(center_vector, cluster_radius=0.5):
                """基于中心点的聚类分析"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": cluster_radius
                    }
                }
                
                results = collection.search(
                    data=[center_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10000
                )
                
                cluster_members = results[0]
                
                # 统计聚类信息
                distances = [hit.distance for hit in cluster_members]
                avg_distance = sum(distances) / len(distances) if distances else 0
                
                print(f"聚类成员数: {len(cluster_members)}")
                print(f"平均距离: {avg_distance:.4f}")
                
                return cluster_members
            
            # 决策树: Top-K vs 范围查询
            def choose_search_method(scenario):
                """根据场景选择搜索方法"""
                recommendations = {
                    "实时推荐": "Top-K (limit=10-20)",
                    "搜索结果": "Top-K (limit=20-50)",
                    "相似内容": "范围查询 (radius=0.5-0.8)",
                    "去重检测": "范围查询 (radius=0.1-0.3)",
                    "聚类分析": "范围查询 (radius=0.5-1.0)",
                    "批量处理": "范围查询 + 分批"
                }
                
                return recommendations.get(scenario, "Top-K (默认)")
            
            print("\n搜索方法选择建议:")
            scenarios = ["实时推荐", "搜索结果", "相似内容", "去重检测", "聚类分析", "批量处理"]
            
            for scenario in scenarios:
                method = choose_search_method(scenario)
                print(f"  {scenario:12s}: {method}")
            
            # 使用示例
            query_vector = [np.random.random() for _ in range(128)]
            
            print("\n实际应用示例:")
            similar_docs = find_all_similar_docs(query_vector, similarity_threshold=0.7)
            duplicates = detect_duplicates(query_vector, duplicate_threshold=0.1)
            cluster = cluster_analysis(query_vector, cluster_radius=0.5)
            ---

6.3 混合检索

01.向量+标量混合
    a.基本混合查询
        a.功能说明
            混合检索结合向量相似度搜索和标量字段过滤，实现更精确的查询。通过expr参数指定标量过滤条件，先过滤再进行向量搜索。可以显著减少向量计算量，提升查询性能。支持等值、范围、逻辑运算等多种过滤条件。标量过滤在向量搜索前执行，是性能优化的关键。适合需要同时满足语义相似和业务条件的场景。需要为过滤字段创建索引以获得最佳性能。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 纯向量搜索（基准）
            start = time.time()
            results_vector_only = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            time_vector_only = time.time() - start
            
            print(f"纯向量搜索:")
            print(f"  查询时间: {time_vector_only*1000:.2f}ms")
            print(f"  结果数: {len(results_vector_only[0])}")
            
            # 混合查询: 向量 + 类别过滤
            start = time.time()
            results_hybrid = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='category == "电子产品"',
                output_fields=["id", "title", "category", "price"]
            )
            time_hybrid = time.time() - start
            
            print(f"\n混合查询（向量 + 类别）:")
            print(f"  查询时间: {time_hybrid*1000:.2f}ms")
            print(f"  结果数: {len(results_hybrid[0])}")
            
            for hit in results_hybrid[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.entity.get('category')}, ¥{hit.entity.get('price'):.2f}")
            
            # 混合查询: 向量 + 价格范围
            results_price = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='price >= 100 and price <= 500',
                output_fields=["id", "title", "price"]
            )
            
            print(f"\n混合查询（向量 + 价格范围）:")
            print(f"  结果数: {len(results_price[0])}")
            for hit in results_price[0][:5]:
                print(f"  {hit.entity.get('title')}: ¥{hit.entity.get('price'):.2f}, 距离: {hit.distance:.4f}")
            
            # 混合查询: 向量 + 多条件
            complex_expr = '''
                category == "电子产品" and
                price >= 100 and price <= 1000 and
                rating >= 4.0 and
                stock > 0
            '''
            
            results_complex = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=complex_expr,
                output_fields=["id", "title", "category", "price", "rating", "stock"]
            )
            
            print(f"\n混合查询（向量 + 多条件）:")
            print(f"  结果数: {len(results_complex[0])}")
            
            # 性能对比
            print(f"\n性能对比:")
            print(f"  纯向量: {time_vector_only*1000:.2f}ms")
            print(f"  混合查询: {time_hybrid*1000:.2f}ms")
            print(f"  性能比: {time_hybrid/time_vector_only:.2f}x")
            print(f"  说明: 混合查询通过标量过滤减少向量计算，可能更快")
            ---
    b.过滤策略
        a.功能说明
            过滤策略影响混合查询的性能和结果。高选择性过滤（过滤掉大部分数据）可以显著提升性能。低选择性过滤效果不明显，反而增加开销。应该将高选择性条件放在前面。复杂表达式可能无法充分利用索引。建议使用简单的AND组合条件。过滤后的候选集应该足够大，避免无结果。需要在过滤严格度和结果数量间平衡。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 测试不同选择性的过滤条件
            filters = [
                ('id >= 0', "无过滤（选择性0%）"),
                ('category == "电子产品"', "低选择性（约25%）"),
                ('price > 500', "中选择性（约50%）"),
                ('category == "电子产品" and price > 500', "高选择性（约10%）"),
                ('category == "电子产品" and price > 800 and rating >= 4.5', "极高选择性（约2%）")
            ]
            
            print("不同选择性过滤条件的性能:\n")
            print(f"{'过滤条件':>50s} {'查询时间':>12s} {'结果数':>8s}")
            print("-" * 75)
            
            for expr, desc in filters:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10,
                    expr=expr,
                    output_fields=["id"]
                )
                elapsed = time.time() - start
                
                print(f"{desc:>50s} {elapsed*1000:10.2f}ms {len(results[0]):8d}")
            
            # 过滤顺序优化
            print("\n过滤顺序优化:")
            
            # 策略1: 低选择性在前
            expr1 = 'category == "电子产品" and price > 800'
            
            start = time.time()
            results1 = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr1
            )
            time1 = time.time() - start
            
            print(f"  低选择性在前: {time1*1000:.2f}ms")
            
            # 策略2: 高选择性在前
            expr2 = 'price > 800 and category == "电子产品"'
            
            start = time.time()
            results2 = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr2
            )
            time2 = time.time() - start
            
            print(f"  高选择性在前: {time2*1000:.2f}ms")
            print(f"  说明: Milvus会自动优化，顺序影响不大")
            
            # 过滤策略决策树
            def recommend_filter_strategy(data_size, filter_selectivity):
                """推荐过滤策略"""
                if filter_selectivity < 0.1:
                    return "极高选择性，优先使用标量查询"
                elif filter_selectivity < 0.3:
                    return "高选择性，混合查询效果好"
                elif filter_selectivity < 0.7:
                    return "中等选择性，混合查询有一定效果"
                else:
                    return "低选择性，考虑纯向量搜索"
            
            print("\n过滤策略建议:")
            selectivities = [0.05, 0.2, 0.5, 0.8]
            
            for sel in selectivities:
                strategy = recommend_filter_strategy(1000000, sel)
                print(f"  选择性 {sel*100:4.1f}%: {strategy}")
            ---

02.多向量混合
    a.多字段搜索
        a.功能说明
            多向量混合搜索支持在一个Collection中搜索多个向量字段。每个向量字段可以使用不同的索引和搜索参数。适合多模态搜索场景，如图文混合搜索。可以为不同向量字段设置不同的权重。需要合并多个向量字段的搜索结果。Milvus支持在单次查询中搜索多个向量字段。结果合并策略影响最终排序。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建多向量字段Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="text_embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=512)
            ]
            schema = CollectionSchema(fields=fields, description="多模态搜索")
            collection = Collection("multimodal_search", schema=schema)
            
            # 插入数据
            data_size = 10000
            ids = list(range(data_size))
            titles = [f"文档{i}" for i in range(data_size)]
            text_embeddings = [[np.random.random() for _ in range(768)] for _ in range(data_size)]
            image_embeddings = [[np.random.random() for _ in range(512)] for _ in range(data_size)]
            
            data = [ids, titles, text_embeddings, image_embeddings]
            collection.insert(data)
            collection.flush()
            
            # 为每个向量字段创建索引
            text_index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "COSINE",
                "params": {"nlist": 128}
            }
            
            image_index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            
            collection.create_index(field_name="text_embedding", index_params=text_index_params)
            collection.create_index(field_name="image_embedding", index_params=image_index_params)
            
            collection.load()
            
            # 文本向量搜索
            text_query = [[np.random.random() for _ in range(768)]]
            
            text_results = collection.search(
                data=text_query,
                anns_field="text_embedding",
                param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                limit=10,
                output_fields=["id", "title"]
            )
            
            print("文本向量搜索结果:")
            for hit in text_results[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 图像向量搜索
            image_query = [[np.random.random() for _ in range(512)]]
            
            image_results = collection.search(
                data=image_query,
                anns_field="image_embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                output_fields=["id", "title"]
            )
            
            print("\n图像向量搜索结果:")
            for hit in image_results[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 多向量融合搜索
            def multimodal_search(text_vector, image_vector, text_weight=0.6, image_weight=0.4):
                """多模态融合搜索"""
                # 分别搜索
                text_results = collection.search(
                    data=[text_vector],
                    anns_field="text_embedding",
                    param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                    limit=50,
                    output_fields=["id", "title"]
                )
                
                image_results = collection.search(
                    data=[image_vector],
                    anns_field="image_embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=50,
                    output_fields=["id", "title"]
                )
                
                # 归一化距离到[0, 1]
                text_scores = {}
                for hit in text_results[0]:
                    # COSINE距离转相似度
                    text_scores[hit.id] = hit.distance
                
                image_scores = {}
                max_image_dist = max([hit.distance for hit in image_results[0]]) if image_results[0] else 1.0
                for hit in image_results[0]:
                    # L2距离归一化
                    image_scores[hit.id] = 1 - (hit.distance / max_image_dist)
                
                # 融合分数
                all_ids = set(text_scores.keys()) | set(image_scores.keys())
                fused_scores = {}
                
                for doc_id in all_ids:
                    text_score = text_scores.get(doc_id, 0)
                    image_score = image_scores.get(doc_id, 0)
                    fused_scores[doc_id] = text_weight * text_score + image_weight * image_score
                
                # 排序
                sorted_results = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
                
                return sorted_results[:10]
            
            # 执行多模态搜索
            text_vec = [np.random.random() for _ in range(768)]
            image_vec = [np.random.random() for _ in range(512)]
            
            fused_results = multimodal_search(text_vec, image_vec, text_weight=0.6, image_weight=0.4)
            
            print("\n多模态融合搜索结果:")
            for doc_id, score in fused_results:
                print(f"  文档ID: {doc_id}, 融合分数: {score:.4f}")
            ---
    b.结果融合
        a.功能说明
            多向量搜索需要合并不同向量字段的结果。常见融合策略包括加权平均、RRF（Reciprocal Rank Fusion）、最大值等。权重设置影响不同模态的重要性。需要归一化不同距离度量的分数。融合算法应该考虑结果的排序位置。可以根据业务场景调整融合策略。需要实验确定最优权重配置。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("multimodal_search")
            collection.load()
            
            # 融合策略1: 加权平均
            def weighted_average_fusion(results_list, weights):
                """加权平均融合"""
                all_scores = {}
                
                for results, weight in zip(results_list, weights):
                    for hit in results[0]:
                        if hit.id not in all_scores:
                            all_scores[hit.id] = 0
                        all_scores[hit.id] += weight * hit.distance
                
                sorted_results = sorted(all_scores.items(), key=lambda x: x[1], reverse=True)
                return sorted_results[:10]
            
            # 融合策略2: RRF (Reciprocal Rank Fusion)
            def rrf_fusion(results_list, k=60):
                """RRF融合，对排序位置不敏感"""
                rrf_scores = {}
                
                for results in results_list:
                    for rank, hit in enumerate(results[0]):
                        if hit.id not in rrf_scores:
                            rrf_scores[hit.id] = 0
                        rrf_scores[hit.id] += 1 / (k + rank + 1)
                
                sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
                return sorted_results[:10]
            
            # 融合策略3: 最大值融合
            def max_fusion(results_list):
                """取每个文档的最大分数"""
                max_scores = {}
                
                for results in results_list:
                    for hit in results[0]:
                        if hit.id not in max_scores:
                            max_scores[hit.id] = hit.distance
                        else:
                            max_scores[hit.id] = max(max_scores[hit.id], hit.distance)
                
                sorted_results = sorted(max_scores.items(), key=lambda x: x[1], reverse=True)
                return sorted_results[:10]
            
            # 测试不同融合策略
            text_query = [[np.random.random() for _ in range(768)]]
            image_query = [[np.random.random() for _ in range(512)]]
            
            text_results = collection.search(
                data=text_query,
                anns_field="text_embedding",
                param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                limit=50
            )
            
            image_results = collection.search(
                data=image_query,
                anns_field="image_embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=50
            )
            
            results_list = [text_results, image_results]
            
            print("不同融合策略对比:\n")
            
            # 加权平均
            wa_results = weighted_average_fusion(results_list, weights=[0.6, 0.4])
            print("加权平均融合 (0.6:0.4):")
            for doc_id, score in wa_results[:5]:
                print(f"  文档ID: {doc_id}, 分数: {score:.4f}")
            
            # RRF
            rrf_results = rrf_fusion(results_list, k=60)
            print("\nRRF融合:")
            for doc_id, score in rrf_results[:5]:
                print(f"  文档ID: {doc_id}, RRF分数: {score:.4f}")
            
            # 最大值
            max_results = max_fusion(results_list)
            print("\n最大值融合:")
            for doc_id, score in max_results[:5]:
                print(f"  文档ID: {doc_id}, 最大分数: {score:.4f}")
            
            # 自适应权重
            class AdaptiveFusion:
                def __init__(self):
                    self.history = []
                
                def fuse(self, results_list, initial_weights=[0.5, 0.5]):
                    """自适应权重融合"""
                    # 计算每个模态的结果质量
                    qualities = []
                    for results in results_list:
                        if len(results[0]) > 0:
                            # 使用距离分布评估质量
                            distances = [hit.distance for hit in results[0]]
                            quality = 1 / (np.std(distances) + 0.01)  # 距离分布越集中质量越高
                        else:
                            quality = 0
                        qualities.append(quality)
                    
                    # 归一化权重
                    total_quality = sum(qualities)
                    if total_quality > 0:
                        adaptive_weights = [q / total_quality for q in qualities]
                    else:
                        adaptive_weights = initial_weights
                    
                    print(f"自适应权重: {adaptive_weights}")
                    
                    # 加权融合
                    return weighted_average_fusion(results_list, adaptive_weights)
            
            adaptive_fusion = AdaptiveFusion()
            adaptive_results = adaptive_fusion.fuse(results_list)
            
            print("\n自适应权重融合:")
            for doc_id, score in adaptive_results[:5]:
                print(f"  文档ID: {doc_id}, 分数: {score:.4f}")
            ---

6.4 标量过滤

01.过滤表达式
    a.表达式语法
        a.功能说明
            Milvus支持丰富的过滤表达式语法，包括比较运算符（==, !=, >, >=, <, <=）、逻辑运算符（and, or, not）、成员运算符（in, not in）等。表达式支持整数、浮点数、字符串、布尔类型字段。可以使用括号改变优先级。字符串比较区分大小写。支持算术表达式和函数调用。表达式会被解析和优化，尽量利用索引。复杂表达式可能影响性能。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 比较运算符
            expressions = [
                ('price == 99.99', "等于"),
                ('price != 99.99', "不等于"),
                ('price > 100', "大于"),
                ('price >= 100', "大于等于"),
                ('price < 500', "小于"),
                ('price <= 500', "小于等于")
            ]
            
            print("比较运算符示例:\n")
            
            for expr, desc in expressions:
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5,
                    expr=expr,
                    output_fields=["id", "title", "price"]
                )
                
                print(f"{desc:10s} ({expr:20s}): {len(results[0])} 条结果")
            
            # 逻辑运算符
            logical_expressions = [
                ('price > 100 and price < 500', "AND运算"),
                ('category == "电子" or category == "图书"', "OR运算"),
                ('not (price > 1000)', "NOT运算"),
                ('(price > 100 and price < 500) or category == "特价"', "组合运算")
            ]
            
            print("\n逻辑运算符示例:\n")
            
            for expr, desc in logical_expressions:
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5,
                    expr=expr
                )
                
                print(f"{desc:10s}: {len(results[0])} 条结果")
            
            # 成员运算符
            member_expressions = [
                ('category in ["电子", "图书", "服装"]', "IN运算"),
                ('category not in ["食品", "玩具"]', "NOT IN运算"),
                ('id in [1, 2, 3, 4, 5]', "ID列表")
            ]
            
            print("\n成员运算符示例:\n")
            
            for expr, desc in member_expressions:
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5,
                    expr=expr,
                    output_fields=["id", "category"]
                )
                
                print(f"{desc:15s}: {len(results[0])} 条结果")
            
            # 字符串匹配
            string_expressions = [
                ('title like "手机%"', "前缀匹配"),
                ('title like "%Pro"', "后缀匹配"),
                ('title like "%iPhone%"', "包含匹配")
            ]
            
            print("\n字符串匹配示例:\n")
            
            for expr, desc in string_expressions:
                try:
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=5,
                        expr=expr,
                        output_fields=["id", "title"]
                    )
                    
                    print(f"{desc:10s}: {len(results[0])} 条结果")
                except Exception as e:
                    print(f"{desc:10s}: 不支持或错误 - {str(e)}")
            
            # 复杂表达式
            complex_expr = '''
                (category == "电子" and price >= 1000 and price <= 5000) or
                (category == "图书" and price >= 50 and rating >= 4.5) or
                (category == "服装" and discount > 0.5)
            '''
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=complex_expr,
                output_fields=["id", "title", "category", "price"]
            )
            
            print(f"\n复杂表达式: {len(results[0])} 条结果")
            ---
    b.表达式优化
        a.功能说明
            表达式优化可以显著提升查询性能。应该使用索引字段进行过滤。将高选择性条件放在前面。避免使用NOT运算符，改用正向条件。使用IN代替多个OR条件。避免在表达式中使用函数调用。简化复杂嵌套表达式。测试表达式的执行计划。监控过滤性能，及时优化。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 优化前: 使用多个OR
            expr_before = '''
                category == "电子" or 
                category == "图书" or 
                category == "服装" or 
                category == "食品"
            '''
            
            start = time.time()
            results_before = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_before
            )
            time_before = time.time() - start
            
            print("优化前（多个OR）:")
            print(f"  查询时间: {time_before*1000:.2f}ms")
            print(f"  结果数: {len(results_before[0])}")
            
            # 优化后: 使用IN
            expr_after = 'category in ["电子", "图书", "服装", "食品"]'
            
            start = time.time()
            results_after = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_after
            )
            time_after = time.time() - start
            
            print(f"\n优化后（IN运算）:")
            print(f"  查询时间: {time_after*1000:.2f}ms")
            print(f"  结果数: {len(results_after[0])}")
            print(f"  加速比: {time_before/time_after:.2f}x")
            
            # 优化: 避免NOT
            expr_not = 'not (price > 1000)'
            expr_positive = 'price <= 1000'
            
            start = time.time()
            results_not = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_not
            )
            time_not = time.time() - start
            
            start = time.time()
            results_positive = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_positive
            )
            time_positive = time.time() - start
            
            print(f"\nNOT运算对比:")
            print(f"  NOT运算: {time_not*1000:.2f}ms")
            print(f"  正向条件: {time_positive*1000:.2f}ms")
            print(f"  加速比: {time_not/time_positive:.2f}x")
            
            # 表达式简化
            class ExpressionOptimizer:
                @staticmethod
                def optimize(expr):
                    """简化表达式"""
                    optimizations = []
                    
                    # 检查多个OR
                    if expr.count(' or ') >= 3:
                        optimizations.append("建议: 使用IN代替多个OR")
                    
                    # 检查NOT
                    if 'not ' in expr.lower():
                        optimizations.append("建议: 避免NOT，使用正向条件")
                    
                    # 检查复杂嵌套
                    if expr.count('(') > 3:
                        optimizations.append("建议: 简化嵌套表达式")
                    
                    # 检查函数调用
                    if '(' in expr and ')' in expr:
                        optimizations.append("警告: 可能包含函数调用，影响性能")
                    
                    return optimizations
                
                @staticmethod
                def analyze(expr, collection):
                    """分析表达式性能"""
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        expr=expr
                    )
                    elapsed = time.time() - start
                    
                    return {
                        "query_time": elapsed * 1000,
                        "result_count": len(results[0])
                    }
            
            optimizer = ExpressionOptimizer()
            
            # 分析复杂表达式
            complex_expr = '''
                not (category == "电子" or category == "图书") and
                (price > 100 or discount > 0.5) and
                rating >= 4.0
            '''
            
            print(f"\n表达式优化建议:")
            suggestions = optimizer.optimize(complex_expr)
            for suggestion in suggestions:
                print(f"  {suggestion}")
            
            metrics = optimizer.analyze(complex_expr, collection)
            print(f"\n性能分析:")
            print(f"  查询时间: {metrics['query_time']:.2f}ms")
            print(f"  结果数: {metrics['result_count']}")
            ---

02.过滤性能
    a.索引利用
        a.功能说明
            过滤性能高度依赖索引。为常用过滤字段创建索引可以显著提升性能。索引类型影响过滤效率，选择合适的索引类型。组合条件可能无法完全利用索引。过滤在向量搜索前执行，减少向量计算量。监控索引使用情况，优化索引配置。定期分析慢查询，优化过滤条件。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            
            # 测试有索引 vs 无索引
            print("索引对过滤性能的影响:\n")
            
            # 场景1: 无索引
            collection.release()
            if collection.has_index("category"):
                collection.drop_index("category")
            
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            results_no_index = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='category == "电子"'
            )
            time_no_index = time.time() - start
            
            print(f"无索引:")
            print(f"  查询时间: {time_no_index*1000:.2f}ms")
            
            # 场景2: 有索引
            collection.release()
            collection.create_index(
                field_name="category",
                index_name="category_index"
            )
            collection.load()
            
            start = time.time()
            results_with_index = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='category == "电子"'
            )
            time_with_index = time.time() - start
            
            print(f"\n有索引:")
            print(f"  查询时间: {time_with_index*1000:.2f}ms")
            print(f"  加速比: {time_no_index/time_with_index:.2f}x")
            
            # 组合条件的索引利用
            print("\n组合条件索引利用:")
            
            # 为price字段创建索引
            collection.release()
            collection.create_index(
                field_name="price",
                index_name="price_index"
            )
            collection.load()
            
            # 测试不同组合
            test_cases = [
                ('category == "电子"', "单字段（有索引）"),
                ('price > 100', "单字段（有索引）"),
                ('category == "电子" and price > 100', "两字段AND（都有索引）"),
                ('category == "电子" or price > 100', "两字段OR（都有索引）"),
                ('category == "电子" and rating > 4.0', "混合（一个有索引）")
            ]
            
            print(f"\n{'表达式':>45s} {'查询时间':>12s}")
            print("-" * 60)
            
            for expr, desc in test_cases:
                start = time.time()
                try:
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        expr=expr
                    )
                    elapsed = time.time() - start
                    print(f"{desc:>45s} {elapsed*1000:10.2f}ms")
                except Exception as e:
                    print(f"{desc:>45s} 错误: {str(e)}")
            
            # 索引选择建议
            print("\n索引选择建议:")
            print("  1. 为高频查询字段创建索引")
            print("  2. 高基数字段（唯一值多）索引效果好")
            print("  3. 低基数字段（如性别）索引效果有限")
            print("  4. 组合查询考虑创建多个单字段索引")
            print("  5. 监控索引使用率，删除无用索引")
            ---
    b.性能监控
        a.功能说明
            监控过滤性能有助于发现瓶颈和优化机会。关注查询延迟、过滤选择性、索引命中率等指标。分析慢查询，识别性能问题。定期审查过滤表达式，优化复杂查询。使用性能分析工具定位瓶颈。建立性能基线，持续监控。设置告警阈值，及时发现异常。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from collections import defaultdict
            
            collection = Collection("products")
            collection.load()
            
            # 性能监控类
            class FilterPerformanceMonitor:
                def __init__(self):
                    self.query_log = []
                    self.stats = defaultdict(list)
                
                def log_query(self, expr, query_time, result_count):
                    """记录查询"""
                    self.query_log.append({
                        "expr": expr,
                        "time": query_time,
                        "count": result_count,
                        "timestamp": time.time()
                    })
                    
                    self.stats[expr].append(query_time)
                
                def get_slow_queries(self, threshold_ms=100):
                    """获取慢查询"""
                    slow_queries = [
                        q for q in self.query_log 
                        if q["time"] > threshold_ms
                    ]
                    return slow_queries
                
                def get_stats(self):
                    """获取统计信息"""
                    stats_summary = {}
                    
                    for expr, times in self.stats.items():
                        stats_summary[expr] = {
                            "count": len(times),
                            "avg_time": np.mean(times),
                            "p95_time": np.percentile(times, 95),
                            "max_time": max(times)
                        }
                    
                    return stats_summary
                
                def recommend_optimizations(self):
                    """推荐优化建议"""
                    recommendations = []
                    
                    slow_queries = self.get_slow_queries(threshold_ms=50)
                    if slow_queries:
                        recommendations.append(
                            f"发现 {len(slow_queries)} 个慢查询（>50ms），建议优化"
                        )
                    
                    stats = self.get_stats()
                    for expr, stat in stats.items():
                        if stat["avg_time"] > 30:
                            recommendations.append(
                                f"表达式 '{expr[:50]}...' 平均耗时 {stat['avg_time']:.2f}ms，建议优化"
                            )
                    
                    return recommendations
            
            # 使用监控器
            monitor = FilterPerformanceMonitor()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 模拟多次查询
            test_expressions = [
                'category == "电子"',
                'price > 100 and price < 500',
                'category in ["电子", "图书", "服装"]',
                'rating >= 4.0 and stock > 0'
            ]
            
            print("执行测试查询...\n")
            
            for _ in range(10):
                for expr in test_expressions:
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        expr=expr
                    )
                    elapsed = (time.time() - start) * 1000
                    
                    monitor.log_query(expr, elapsed, len(results[0]))
            
            # 分析结果
            print("性能统计:\n")
            print(f"{'表达式':>50s} {'查询次数':>10s} {'平均时间':>12s} {'P95时间':>12s}")
            print("-" * 90)
            
            stats = monitor.get_stats()
            for expr, stat in stats.items():
                print(f"{expr:>50s} {stat['count']:>10d} {stat['avg_time']:>10.2f}ms {stat['p95_time']:>10.2f}ms")
            
            # 慢查询分析
            slow_queries = monitor.get_slow_queries(threshold_ms=20)
            if slow_queries:
                print(f"\n慢查询 (>20ms): {len(slow_queries)} 个")
                for q in slow_queries[:5]:
                    print(f"  {q['expr'][:50]}: {q['time']:.2f}ms")
            
            # 优化建议
            print("\n优化建议:")
            recommendations = monitor.recommend_optimizations()
            for rec in recommendations:
                print(f"  - {rec}")
            
            # 性能报告
            print("\n性能报告:")
            print(f"  总查询数: {len(monitor.query_log)}")
            print(f"  平均延迟: {np.mean([q['time'] for q in monitor.query_log]):.2f}ms")
            print(f"  P95延迟: {np.percentile([q['time'] for q in monitor.query_log], 95):.2f}ms")
            print(f"  P99延迟: {np.percentile([q['time'] for q in monitor.query_log], 99):.2f}ms")
            ---

6.5 批量查询

01.批量搜索
    a.批量提交
        a.功能说明
            批量搜索允许一次提交多个查询向量，提升吞吐量。Milvus会并行处理批量查询，共享索引访问开销。批量大小影响性能，推荐10-100个查询一批。过大的批量可能导致内存压力和延迟增加。批量查询返回列表，每个元素对应一个查询的结果。适合离线批处理场景，如批量推荐、批量相似度计算等。可以显著降低网络往返开销。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单次查询性能
            print("单次查询 vs 批量查询性能对比:\n")
            
            num_queries = 100
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
            
            # 方式1: 逐个查询
            start = time.time()
            results_sequential = []
            for query_vector in query_vectors:
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                results_sequential.append(results[0])
            time_sequential = time.time() - start
            
            print(f"逐个查询 ({num_queries}次):")
            print(f"  总时间: {time_sequential:.2f}s")
            print(f"  平均每次: {time_sequential/num_queries*1000:.2f}ms")
            print(f"  QPS: {num_queries/time_sequential:.2f}")
            
            # 方式2: 批量查询
            start = time.time()
            results_batch = collection.search(
                data=query_vectors,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            time_batch = time.time() - start
            
            print(f"\n批量查询 ({num_queries}次):")
            print(f"  总时间: {time_batch:.2f}s")
            print(f"  平均每次: {time_batch/num_queries*1000:.2f}ms")
            print(f"  QPS: {num_queries/time_batch:.2f}")
            print(f"  加速比: {time_sequential/time_batch:.2f}x")
            
            # 不同批量大小的性能
            batch_sizes = [1, 10, 50, 100, 200]
            
            print("\n不同批量大小的性能:\n")
            print(f"{'批量大小':>10s} {'总时间':>10s} {'平均每次':>12s} {'QPS':>10s}")
            print("-" * 48)
            
            for batch_size in batch_sizes:
                test_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                
                start = time.time()
                results = collection.search(
                    data=test_vectors,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                elapsed = time.time() - start
                
                avg_time = elapsed / batch_size * 1000
                qps = batch_size / elapsed
                
                print(f"{batch_size:10d} {elapsed:9.3f}s {avg_time:10.2f}ms {qps:9.2f}")
            
            # 批量查询最佳实践
            print("\n批量查询最佳实践:")
            print("  1. 批量大小: 10-100（根据延迟要求）")
            print("  2. 离线处理: 使用更大批量（100-500）")
            print("  3. 实时场景: 使用小批量（10-50）")
            print("  4. 监控内存: 避免批量过大导致OOM")
            print("  5. 并发控制: 限制同时批量查询数")
            ---
    b.并发查询
        a.功能说明
            并发查询通过多线程或多进程提升吞吐量。Milvus支持多客户端并发查询，充分利用服务器资源。并发数应该根据服务器CPU核心数调整。过高并发可能导致资源竞争和性能下降。需要在延迟和吞吐量间权衡。适合高吞吐场景，如批量推荐系统。建议使用连接池管理并发连接。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            import numpy as np
            import time
            import concurrent.futures
            from threading import Lock
            
            # 连接Milvus
            connections.connect(host="localhost", port="19530")
            
            collection = Collection("documents")
            collection.load()
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单线程查询
            def single_thread_queries(num_queries=100):
                """单线程查询"""
                query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                
                start = time.time()
                for query_vector in query_vectors:
                    collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                elapsed = time.time() - start
                
                return elapsed, num_queries
            
            # 多线程查询
            def multi_thread_queries(num_queries=100, num_workers=4):
                """多线程查询"""
                query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                
                def query_worker(query_vector):
                    """单个查询任务"""
                    return collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                
                start = time.time()
                with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
                    futures = [executor.submit(query_worker, qv) for qv in query_vectors]
                    results = [future.result() for future in concurrent.futures.as_completed(futures)]
                elapsed = time.time() - start
                
                return elapsed, num_queries
            
            # 性能对比
            print("并发查询性能测试:\n")
            
            num_queries = 100
            
            # 单线程
            time_single, count_single = single_thread_queries(num_queries)
            qps_single = count_single / time_single
            
            print(f"单线程:")
            print(f"  总时间: {time_single:.2f}s")
            print(f"  QPS: {qps_single:.2f}")
            
            # 不同并发数
            worker_counts = [2, 4, 8, 16]
            
            print(f"\n不同并发数性能:\n")
            print(f"{'并发数':>8s} {'总时间':>10s} {'QPS':>10s} {'加速比':>10s}")
            print("-" * 42)
            
            for num_workers in worker_counts:
                time_multi, count_multi = multi_thread_queries(num_queries, num_workers)
                qps_multi = count_multi / time_multi
                speedup = time_single / time_multi
                
                print(f"{num_workers:8d} {time_multi:9.2f}s {qps_multi:9.2f} {speedup:9.2f}x")
            
            # 并发控制器
            class ConcurrentQueryController:
                def __init__(self, collection, max_workers=8):
                    self.collection = collection
                    self.max_workers = max_workers
                    self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
                    self.lock = Lock()
                    self.query_count = 0
                
                def query(self, query_vector, search_params, limit=10):
                    """提交查询任务"""
                    def _query():
                        with self.lock:
                            self.query_count += 1
                        
                        return self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=limit
                        )
                    
                    return self.executor.submit(_query)
                
                def batch_query(self, query_vectors, search_params, limit=10):
                    """批量提交查询"""
                    futures = [self.query(qv, search_params, limit) for qv in query_vectors]
                    return futures
                
                def wait_all(self, futures):
                    """等待所有查询完成"""
                    results = []
                    for future in concurrent.futures.as_completed(futures):
                        results.append(future.result())
                    return results
                
                def get_stats(self):
                    """获取统计信息"""
                    return {
                        "total_queries": self.query_count,
                        "max_workers": self.max_workers
                    }
                
                def shutdown(self):
                    """关闭执行器"""
                    self.executor.shutdown(wait=True)
            
            # 使用并发控制器
            controller = ConcurrentQueryController(collection, max_workers=8)
            
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(50)]
            
            print("\n使用并发控制器:")
            start = time.time()
            futures = controller.batch_query(query_vectors, search_params, limit=10)
            results = controller.wait_all(futures)
            elapsed = time.time() - start
            
            stats = controller.get_stats()
            print(f"  查询数: {stats['total_queries']}")
            print(f"  并发数: {stats['max_workers']}")
            print(f"  总时间: {elapsed:.2f}s")
            print(f"  QPS: {stats['total_queries']/elapsed:.2f}")
            
            controller.shutdown()
            
            # 并发优化建议
            print("\n并发优化建议:")
            print("  1. 并发数 = CPU核心数 × 2")
            print("  2. 使用连接池避免频繁建立连接")
            print("  3. 监控资源使用，避免过载")
            print("  4. 实时场景用低并发，批处理用高并发")
            print("  5. 结合批量查询和并发，最大化吞吐")
            ---

02.批量优化
    a.内存管理
        a.功能说明
            批量查询需要注意内存管理，避免OOM。查询向量和结果都占用内存，批量过大会导致内存溢出。应该根据可用内存限制批量大小。可以使用流式处理，分批加载和处理数据。监控内存使用，及时释放不需要的对象。使用生成器避免一次性加载所有数据。合理设置limit避免返回过多结果。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import psutil
            import gc
            
            collection = Collection("documents")
            collection.load()
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 内存监控
            def get_memory_usage():
                """获取当前内存使用"""
                process = psutil.Process()
                memory_info = process.memory_info()
                return memory_info.rss / 1024 / 1024  # MB
            
            # 批量查询内存分析
            print("批量查询内存使用分析:\n")
            
            batch_sizes = [10, 50, 100, 500, 1000]
            
            print(f"{'批量大小':>10s} {'查询前':>12s} {'查询后':>12s} {'增长':>12s}")
            print("-" * 50)
            
            for batch_size in batch_sizes:
                # 清理内存
                gc.collect()
                
                mem_before = get_memory_usage()
                
                # 生成查询向量
                query_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                
                # 执行查询
                results = collection.search(
                    data=query_vectors,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                mem_after = get_memory_usage()
                mem_increase = mem_after - mem_before
                
                print(f"{batch_size:10d} {mem_before:10.2f}MB {mem_after:10.2f}MB {mem_increase:10.2f}MB")
                
                # 清理结果
                del query_vectors
                del results
                gc.collect()
            
            # 流式批量查询
            def streaming_batch_query(total_queries, batch_size=100):
                """流式批量查询，避免内存溢出"""
                num_batches = (total_queries + batch_size - 1) // batch_size
                
                for batch_idx in range(num_batches):
                    start_idx = batch_idx * batch_size
                    end_idx = min(start_idx + batch_size, total_queries)
                    current_batch_size = end_idx - start_idx
                    
                    # 生成当前批次的查询向量
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(current_batch_size)]
                    
                    # 执行查询
                    results = collection.search(
                        data=query_vectors,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    
                    # 处理结果（这里只是打印）
                    yield batch_idx, results
                    
                    # 清理内存
                    del query_vectors
                    del results
                    gc.collect()
            
            print("\n流式批量查询:")
            
            mem_start = get_memory_usage()
            print(f"开始内存: {mem_start:.2f}MB")
            
            total_queries = 1000
            batch_size = 100
            
            for batch_idx, results in streaming_batch_query(total_queries, batch_size):
                mem_current = get_memory_usage()
                print(f"  批次 {batch_idx+1}: {len(results)} 个结果, 内存: {mem_current:.2f}MB")
            
            mem_end = get_memory_usage()
            print(f"结束内存: {mem_end:.2f}MB")
            print(f"内存增长: {mem_end - mem_start:.2f}MB")
            
            # 自适应批量大小
            class AdaptiveBatchQuery:
                def __init__(self, collection, max_memory_mb=1000):
                    self.collection = collection
                    self.max_memory_mb = max_memory_mb
                    self.batch_size = 100
                
                def estimate_batch_size(self, vector_dim=128, limit=10):
                    """估算合适的批量大小"""
                    # 估算单个查询的内存占用
                    query_memory = vector_dim * 4 / 1024 / 1024  # MB
                    result_memory = limit * (vector_dim * 4 + 100) / 1024 / 1024  # MB
                    per_query_memory = query_memory + result_memory
                    
                    # 计算批量大小
                    available_memory = self.max_memory_mb * 0.8  # 留20%余量
                    estimated_batch_size = int(available_memory / per_query_memory)
                    
                    return max(10, min(estimated_batch_size, 1000))
                
                def query(self, query_vectors, search_params, limit=10):
                    """自适应批量查询"""
                    # 动态调整批量大小
                    optimal_batch_size = self.estimate_batch_size(limit=limit)
                    
                    print(f"自适应批量大小: {optimal_batch_size}")
                    
                    all_results = []
                    num_queries = len(query_vectors)
                    
                    for i in range(0, num_queries, optimal_batch_size):
                        batch = query_vectors[i:i+optimal_batch_size]
                        
                        results = self.collection.search(
                            data=batch,
                            anns_field="embedding",
                            param=search_params,
                            limit=limit
                        )
                        
                        all_results.extend(results)
                        
                        # 检查内存
                        current_memory = get_memory_usage()
                        if current_memory > self.max_memory_mb:
                            print(f"警告: 内存使用 {current_memory:.2f}MB 超过限制")
                            gc.collect()
                    
                    return all_results
            
            adaptive_query = AdaptiveBatchQuery(collection, max_memory_mb=500)
            
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(500)]
            results = adaptive_query.query(query_vectors, search_params, limit=10)
            
            print(f"\n自适应查询完成: {len(results)} 个结果")
            ---
    b.性能调优
        a.功能说明
            批量查询性能调优需要综合考虑多个因素。批量大小、并发数、搜索参数都影响性能。应该通过实验确定最优配置。监控QPS、延迟、内存等指标。使用性能分析工具定位瓶颈。考虑使用缓存减少重复查询。优化网络传输，使用压缩等技术。建立性能基线，持续优化。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 性能调优实验
            class BatchQueryTuner:
                def __init__(self, collection):
                    self.collection = collection
                    self.results = []
                
                def tune_batch_size(self, query_vectors, search_params):
                    """调优批量大小"""
                    batch_sizes = [10, 20, 50, 100, 200]
                    
                    print("批量大小调优:\n")
                    print(f"{'批量大小':>10s} {'总时间':>10s} {'QPS':>10s} {'平均延迟':>12s}")
                    print("-" * 48)
                    
                    best_qps = 0
                    best_batch_size = 10
                    
                    for batch_size in batch_sizes:
                        # 使用前N个查询
                        test_vectors = query_vectors[:min(batch_size * 10, len(query_vectors))]
                        
                        start = time.time()
                        for i in range(0, len(test_vectors), batch_size):
                            batch = test_vectors[i:i+batch_size]
                            self.collection.search(
                                data=batch,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                        elapsed = time.time() - start
                        
                        qps = len(test_vectors) / elapsed
                        avg_latency = elapsed / len(test_vectors) * 1000
                        
                        print(f"{batch_size:10d} {elapsed:9.2f}s {qps:9.2f} {avg_latency:10.2f}ms")
                        
                        if qps > best_qps:
                            best_qps = qps
                            best_batch_size = batch_size
                    
                    print(f"\n最优批量大小: {best_batch_size} (QPS: {best_qps:.2f})")
                    return best_batch_size
                
                def tune_search_params(self, query_vectors, batch_size):
                    """调优搜索参数"""
                    nprobe_values = [8, 16, 32, 64]
                    
                    print("\n搜索参数调优:\n")
                    print(f"{'nprobe':>8s} {'总时间':>10s} {'QPS':>10s}")
                    print("-" * 32)
                    
                    best_qps = 0
                    best_nprobe = 16
                    
                    for nprobe in nprobe_values:
                        search_params = {
                            "metric_type": "L2",
                            "params": {"nprobe": nprobe}
                        }
                        
                        test_vectors = query_vectors[:min(batch_size * 10, len(query_vectors))]
                        
                        start = time.time()
                        for i in range(0, len(test_vectors), batch_size):
                            batch = test_vectors[i:i+batch_size]
                            self.collection.search(
                                data=batch,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                        elapsed = time.time() - start
                        
                        qps = len(test_vectors) / elapsed
                        
                        print(f"{nprobe:8d} {elapsed:9.2f}s {qps:9.2f}")
                        
                        if qps > best_qps:
                            best_qps = qps
                            best_nprobe = nprobe
                    
                    print(f"\n最优nprobe: {best_nprobe} (QPS: {best_qps:.2f})")
                    return best_nprobe
                
                def full_tune(self, num_queries=1000):
                    """完整调优流程"""
                    print("=" * 60)
                    print("批量查询性能调优")
                    print("=" * 60 + "\n")
                    
                    # 生成测试查询
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                    
                    # 调优批量大小
                    optimal_batch_size = self.tune_batch_size(
                        query_vectors,
                        {"metric_type": "L2", "params": {"nprobe": 16}}
                    )
                    
                    # 调优搜索参数
                    optimal_nprobe = self.tune_search_params(query_vectors, optimal_batch_size)
                    
                    # 最终配置
                    print("\n" + "=" * 60)
                    print("最优配置")
                    print("=" * 60)
                    print(f"  批量大小: {optimal_batch_size}")
                    print(f"  nprobe: {optimal_nprobe}")
                    
                    # 验证性能
                    optimal_search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": optimal_nprobe}
                    }
                    
                    start = time.time()
                    for i in range(0, len(query_vectors), optimal_batch_size):
                        batch = query_vectors[i:i+optimal_batch_size]
                        self.collection.search(
                            data=batch,
                            anns_field="embedding",
                            param=optimal_search_params,
                            limit=10
                        )
                    elapsed = time.time() - start
                    
                    final_qps = len(query_vectors) / elapsed
                    final_latency = elapsed / len(query_vectors) * 1000
                    
                    print(f"\n最终性能:")
                    print(f"  QPS: {final_qps:.2f}")
                    print(f"  平均延迟: {final_latency:.2f}ms")
                    print(f"  总时间: {elapsed:.2f}s")
            
            # 执行调优
            tuner = BatchQueryTuner(collection)
            tuner.full_tune(num_queries=500)
            ---

7 高级特性

7.1 分区管理

01.分区概念
    a.分区作用
        a.功能说明
            分区是Collection内的逻辑分组，用于组织和管理数据。通过分区可以提升查询性能，只搜索相关分区而不是整个Collection。分区适合按时间、类别、地域等维度划分数据。每个Collection可以有多个分区，默认有一个_default分区。分区之间数据隔离，互不影响。可以独立加载、释放、删除分区。合理使用分区可以显著优化查询效率和资源使用。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建分区
            partition_2024 = collection.create_partition("year_2024")
            partition_2023 = collection.create_partition("year_2023")
            partition_2022 = collection.create_partition("year_2022")
            
            print("已创建分区:")
            for partition in collection.partitions:
                print(f"  - {partition.name}")
            
            # 向不同分区插入数据
            data_2024 = [
                [i for i in range(1000, 2000)],  # ids
                [f"文档2024_{i}" for i in range(1000)],  # titles
                [[np.random.random() for _ in range(128)] for _ in range(1000)]  # embeddings
            ]
            
            partition_2024.insert(data_2024)
            
            data_2023 = [
                [i for i in range(2000, 3000)],
                [f"文档2023_{i}" for i in range(1000)],
                [[np.random.random() for _ in range(128)] for _ in range(1000)]
            ]
            
            partition_2023.insert(data_2023)
            
            collection.flush()
            
            print(f"\n分区数据量:")
            print(f"  year_2024: {partition_2024.num_entities} 条")
            print(f"  year_2023: {partition_2023.num_entities} 条")
            print(f"  总计: {collection.num_entities} 条")
            
            # 分区搜索
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 搜索特定分区
            results_2024 = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                partition_names=["year_2024"],
                output_fields=["id", "title"]
            )
            
            print(f"\n搜索year_2024分区:")
            for hit in results_2024[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 搜索多个分区
            results_multi = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                partition_names=["year_2024", "year_2023"],
                output_fields=["id", "title"]
            )
            
            print(f"\n搜索多个分区:")
            for hit in results_multi[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 搜索所有分区（不指定partition_names）
            results_all = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                output_fields=["id", "title"]
            )
            
            print(f"\n搜索所有分区:")
            for hit in results_all[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            ---
    b.分区策略
        a.功能说明
            分区策略影响系统性能和可维护性。常见策略包括按时间分区（日、月、年）、按类别分区（产品类型、文档类型）、按哈希分区（均匀分布）等。时间分区适合时序数据，便于数据老化和归档。类别分区适合多租户或多类型数据。哈希分区适合均匀分布负载。分区数量不宜过多，推荐10-100个。需要根据业务特点选择合适策略。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import hashlib
            from datetime import datetime, timedelta
            
            # 策略1: 按时间分区
            class TimeBasedPartitioning:
                def __init__(self, collection):
                    self.collection = collection
                
                def create_monthly_partitions(self, start_date, num_months):
                    """创建按月分区"""
                    partitions = []
                    current_date = start_date
                    
                    for i in range(num_months):
                        partition_name = current_date.strftime("month_%Y_%m")
                        
                        if not self.collection.has_partition(partition_name):
                            partition = self.collection.create_partition(partition_name)
                            partitions.append(partition)
                            print(f"创建分区: {partition_name}")
                        
                        # 下一个月
                        if current_date.month == 12:
                            current_date = datetime(current_date.year + 1, 1, 1)
                        else:
                            current_date = datetime(current_date.year, current_date.month + 1, 1)
                    
                    return partitions
                
                def get_partition_by_date(self, date):
                    """根据日期获取分区名"""
                    return date.strftime("month_%Y_%m")
                
                def insert_with_date(self, data, date):
                    """插入数据到对应日期的分区"""
                    partition_name = self.get_partition_by_date(date)
                    
                    if not self.collection.has_partition(partition_name):
                        self.collection.create_partition(partition_name)
                    
                    partition = Partition(self.collection, partition_name)
                    partition.insert(data)
                    
                    print(f"数据插入到分区: {partition_name}")
            
            collection = Collection("time_series_docs")
            time_partitioner = TimeBasedPartitioning(collection)
            
            # 创建最近6个月的分区
            start_date = datetime(2024, 1, 1)
            time_partitioner.create_monthly_partitions(start_date, 6)
            
            # 策略2: 按类别分区
            class CategoryBasedPartitioning:
                def __init__(self, collection):
                    self.collection = collection
                    self.categories = {}
                
                def create_category_partitions(self, categories):
                    """为每个类别创建分区"""
                    for category in categories:
                        partition_name = f"cat_{category.lower().replace(' ', '_')}"
                        
                        if not self.collection.has_partition(partition_name):
                            partition = self.collection.create_partition(partition_name)
                            self.categories[category] = partition_name
                            print(f"创建分区: {partition_name}")
                
                def insert_by_category(self, data, category):
                    """插入数据到对应类别的分区"""
                    if category not in self.categories:
                        raise ValueError(f"未知类别: {category}")
                    
                    partition_name = self.categories[category]
                    partition = Partition(self.collection, partition_name)
                    partition.insert(data)
                    
                    print(f"数据插入到分区: {partition_name}")
            
            category_partitioner = CategoryBasedPartitioning(collection)
            categories = ["电子产品", "图书", "服装", "食品"]
            category_partitioner.create_category_partitions(categories)
            
            # 策略3: 按哈希分区
            class HashBasedPartitioning:
                def __init__(self, collection, num_partitions=10):
                    self.collection = collection
                    self.num_partitions = num_partitions
                    self.create_hash_partitions()
                
                def create_hash_partitions(self):
                    """创建哈希分区"""
                    for i in range(self.num_partitions):
                        partition_name = f"hash_{i:03d}"
                        
                        if not self.collection.has_partition(partition_name):
                            self.collection.create_partition(partition_name)
                            print(f"创建分区: {partition_name}")
                
                def get_partition_by_id(self, doc_id):
                    """根据ID计算分区"""
                    partition_idx = hash(str(doc_id)) % self.num_partitions
                    return f"hash_{partition_idx:03d}"
                
                def insert_by_hash(self, data):
                    """根据哈希分配数据到分区"""
                    # 假设data[0]是ID列表
                    ids = data[0]
                    
                    # 按分区分组数据
                    partition_data = {}
                    for i, doc_id in enumerate(ids):
                        partition_name = self.get_partition_by_id(doc_id)
                        
                        if partition_name not in partition_data:
                            partition_data[partition_name] = [[] for _ in range(len(data))]
                        
                        for j, field_data in enumerate(data):
                            partition_data[partition_name][j].append(field_data[i])
                    
                    # 插入到各分区
                    for partition_name, pdata in partition_data.items():
                        partition = Partition(self.collection, partition_name)
                        partition.insert(pdata)
                        print(f"插入 {len(pdata[0])} 条数据到 {partition_name}")
            
            hash_partitioner = HashBasedPartitioning(collection, num_partitions=10)
            
            # 分区策略选择
            print("\n分区策略选择建议:")
            print("  时间分区: 适合日志、时序数据，便于归档")
            print("  类别分区: 适合多租户、多类型数据")
            print("  哈希分区: 适合均匀分布，负载均衡")
            print("  混合分区: 先按类别再按时间，多级分区")
            ---

02.分区操作
    a.加载释放
        a.功能说明
            分区可以独立加载和释放，节省内存资源。只加载需要查询的分区，其他分区保持释放状态。加载分区会将索引和部分数据加载到内存。释放分区会释放内存，但数据仍保留在存储中。可以动态加载释放分区，适应查询模式变化。热数据分区保持加载，冷数据分区按需加载。合理管理分区加载状态可以优化内存使用。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建多个分区
            partitions = []
            for year in [2022, 2023, 2024]:
                partition_name = f"year_{year}"
                if not collection.has_partition(partition_name):
                    partition = collection.create_partition(partition_name)
                    partitions.append(partition)
                    
                    # 插入数据
                    data = [
                        [i for i in range(year*1000, year*1000+1000)],
                        [f"文档{year}_{i}" for i in range(1000)],
                        [[np.random.random() for _ in range(128)] for _ in range(1000)]
                    ]
                    partition.insert(data)
            
            collection.flush()
            
            # 加载特定分区
            print("加载特定分区:\n")
            
            partition_2024 = Partition(collection, "year_2024")
            
            print(f"分区状态: {partition_2024.is_loaded}")
            
            partition_2024.load()
            print(f"加载后状态: {partition_2024.is_loaded}")
            
            # 查询已加载分区
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                partition_names=["year_2024"]
            )
            
            print(f"\n查询year_2024分区: {len(results[0])} 条结果")
            
            # 释放分区
            partition_2024.release()
            print(f"\n释放后状态: {partition_2024.is_loaded}")
            
            # 动态加载管理
            class PartitionLoadManager:
                def __init__(self, collection):
                    self.collection = collection
                    self.loaded_partitions = set()
                
                def load_partition(self, partition_name):
                    """加载分区"""
                    if partition_name in self.loaded_partitions:
                        print(f"分区 {partition_name} 已加载")
                        return
                    
                    partition = Partition(self.collection, partition_name)
                    
                    start = time.time()
                    partition.load()
                    elapsed = time.time() - start
                    
                    self.loaded_partitions.add(partition_name)
                    print(f"加载分区 {partition_name}: {elapsed:.2f}s")
                
                def release_partition(self, partition_name):
                    """释放分区"""
                    if partition_name not in self.loaded_partitions:
                        print(f"分区 {partition_name} 未加载")
                        return
                    
                    partition = Partition(self.collection, partition_name)
                    partition.release()
                    
                    self.loaded_partitions.remove(partition_name)
                    print(f"释放分区 {partition_name}")
                
                def load_partitions(self, partition_names):
                    """批量加载分区"""
                    for name in partition_names:
                        self.load_partition(name)
                
                def release_all(self):
                    """释放所有分区"""
                    for name in list(self.loaded_partitions):
                        self.release_partition(name)
                
                def get_loaded_partitions(self):
                    """获取已加载分区列表"""
                    return list(self.loaded_partitions)
            
            # 使用加载管理器
            load_manager = PartitionLoadManager(collection)
            
            print("\n动态加载管理:")
            
            # 加载热数据分区
            load_manager.load_partitions(["year_2024", "year_2023"])
            
            print(f"已加载分区: {load_manager.get_loaded_partitions()}")
            
            # 查询热数据
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                partition_names=["year_2024", "year_2023"]
            )
            
            print(f"查询热数据: {len(results[0])} 条结果")
            
            # 切换到冷数据
            load_manager.release_partition("year_2023")
            load_manager.load_partition("year_2022")
            
            print(f"切换后已加载分区: {load_manager.get_loaded_partitions()}")
            
            # 释放所有
            load_manager.release_all()
            print(f"释放后已加载分区: {load_manager.get_loaded_partitions()}")
            
            # 内存优化建议
            print("\n内存优化建议:")
            print("  1. 只加载近期数据分区（如最近3个月）")
            print("  2. 历史数据按需加载，查询后释放")
            print("  3. 监控内存使用，避免加载过多分区")
            print("  4. 使用LRU策略自动管理分区加载")
            print("  5. 考虑分区大小，避免单个分区过大")
            ---
    b.删除分区
        a.功能说明
            删除分区会永久删除分区及其所有数据。删除前需要先释放分区。删除操作不可逆，需要谨慎操作。可以用于清理过期数据，如删除旧的时间分区。删除分区可以释放存储空间。建议在删除前备份重要数据。删除分区不影响其他分区的数据和查询。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建测试分区
            test_partition = collection.create_partition("test_partition")
            
            # 插入测试数据
            data = [
                [i for i in range(10000, 11000)],
                [f"测试文档_{i}" for i in range(1000)],
                [[np.random.random() for _ in range(128)] for _ in range(1000)]
            ]
            test_partition.insert(data)
            collection.flush()
            
            print(f"创建测试分区: test_partition")
            print(f"数据量: {test_partition.num_entities} 条")
            
            # 列出所有分区
            print(f"\n当前分区:")
            for partition in collection.partitions:
                print(f"  - {partition.name}: {partition.num_entities} 条")
            
            # 删除分区
            print(f"\n删除test_partition分区...")
            
            # 先释放（如果已加载）
            if test_partition.is_loaded:
                test_partition.release()
            
            # 删除分区
            collection.drop_partition("test_partition")
            
            print(f"删除完成")
            
            # 验证删除
            print(f"\n删除后分区:")
            for partition in collection.partitions:
                print(f"  - {partition.name}: {partition.num_entities} 条")
            
            # 批量删除旧分区
            class PartitionCleaner:
                def __init__(self, collection):
                    self.collection = collection
                
                def delete_old_time_partitions(self, keep_months=3):
                    """删除旧的时间分区，保留最近N个月"""
                    from datetime import datetime, timedelta
                    
                    cutoff_date = datetime.now() - timedelta(days=keep_months*30)
                    
                    deleted_partitions = []
                    
                    for partition in self.collection.partitions:
                        # 跳过默认分区
                        if partition.name == "_default":
                            continue
                        
                        # 解析分区名（假设格式为month_YYYY_MM）
                        if partition.name.startswith("month_"):
                            try:
                                parts = partition.name.split("_")
                                year = int(parts[1])
                                month = int(parts[2])
                                partition_date = datetime(year, month, 1)
                                
                                if partition_date < cutoff_date:
                                    # 释放并删除
                                    if partition.is_loaded:
                                        partition.release()
                                    
                                    self.collection.drop_partition(partition.name)
                                    deleted_partitions.append(partition.name)
                                    print(f"删除旧分区: {partition.name}")
                            except Exception as e:
                                print(f"解析分区名失败: {partition.name}, {e}")
                    
                    return deleted_partitions
                
                def delete_empty_partitions(self):
                    """删除空分区"""
                    deleted_partitions = []
                    
                    for partition in self.collection.partitions:
                        if partition.name == "_default":
                            continue
                        
                        if partition.num_entities == 0:
                            if partition.is_loaded:
                                partition.release()
                            
                            self.collection.drop_partition(partition.name)
                            deleted_partitions.append(partition.name)
                            print(f"删除空分区: {partition.name}")
                    
                    return deleted_partitions
                
                def safe_delete_partition(self, partition_name, backup_path=None):
                    """安全删除分区（可选备份）"""
                    partition = Partition(self.collection, partition_name)
                    
                    # 备份数据
                    if backup_path:
                        print(f"备份分区 {partition_name} 到 {backup_path}")
                        # 这里应该实现实际的备份逻辑
                        # 例如导出数据到文件
                    
                    # 释放并删除
                    if partition.is_loaded:
                        partition.release()
                    
                    self.collection.drop_partition(partition_name)
                    print(f"删除分区: {partition_name}")
            
            cleaner = PartitionCleaner(collection)
            
            # 删除旧分区
            print("\n清理旧分区（保留最近3个月）:")
            deleted = cleaner.delete_old_time_partitions(keep_months=3)
            print(f"删除了 {len(deleted)} 个旧分区")
            
            # 删除空分区
            print("\n清理空分区:")
            deleted = cleaner.delete_empty_partitions()
            print(f"删除了 {len(deleted)} 个空分区")
            
            # 删除注意事项
            print("\n删除分区注意事项:")
            print("  1. 删除操作不可逆，务必谨慎")
            print("  2. 删除前建议备份重要数据")
            print("  3. 先释放分区再删除")
            print("  4. 不能删除_default分区")
            print("  5. 定期清理过期分区释放存储")
            ---

7.2 副本配置

01.副本机制
    a.副本作用
        a.功能说明
            副本机制提供数据冗余和高可用性，提升查询吞吐量。每个副本包含完整的数据和索引副本。多个副本可以并行处理查询请求，提升QPS。副本之间数据保持一致，自动同步更新。副本数量可以动态调整，适应负载变化。适合读多写少的场景，如搜索推荐系统。副本会占用额外的内存和存储资源。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 查看当前副本配置
            print("当前副本配置:")
            replicas = collection.get_replicas()
            print(f"  副本数量: {len(replicas.groups)}")
            
            for i, replica in enumerate(replicas.groups):
                print(f"\n  副本 {i+1}:")
                print(f"    副本ID: {replica.id}")
                print(f"    分片数: {len(replica.shards)}")
                print(f"    节点: {replica.resource_group}")
            
            # 创建副本
            print("\n创建副本...")
            collection.load(replica_number=3)
            
            replicas = collection.get_replicas()
            print(f"创建后副本数量: {len(replicas.groups)}")
            
            # 测试副本对查询性能的影响
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单副本性能
            collection.release()
            collection.load(replica_number=1)
            
            start = time.time()
            for _ in range(100):
                collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            time_single = time.time() - start
            
            qps_single = 100 / time_single
            print(f"\n单副本性能:")
            print(f"  查询时间: {time_single:.2f}s")
            print(f"  QPS: {qps_single:.2f}")
            
            # 多副本性能
            collection.release()
            collection.load(replica_number=3)
            
            start = time.time()
            for _ in range(100):
                collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            time_multi = time.time() - start
            
            qps_multi = 100 / time_multi
            print(f"\n三副本性能:")
            print(f"  查询时间: {time_multi:.2f}s")
            print(f"  QPS: {qps_multi:.2f}")
            print(f"  提升: {qps_multi/qps_single:.2f}x")
            
            # 副本配置建议
            print("\n副本配置建议:")
            print("  1. 读多写少: 使用2-3个副本")
            print("  2. 高可用: 至少2个副本")
            print("  3. 高吞吐: 3-5个副本")
            print("  4. 资源有限: 1个副本")
            print("  5. 副本数 ≤ QueryNode数量")
            ---
    b.副本管理
        a.功能说明
            副本管理包括创建、调整、监控副本。可以动态调整副本数量，无需停机。副本数量影响内存使用和查询性能。需要监控副本状态，确保所有副本正常工作。副本故障会自动切换到其他副本。可以为不同Collection配置不同副本数。合理配置副本可以平衡性能和成本。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 副本管理类
            class ReplicaManager:
                def __init__(self, collection):
                    self.collection = collection
                
                def get_replica_info(self):
                    """获取副本信息"""
                    if not self.collection.is_loaded:
                        return {"loaded": False}
                    
                    replicas = self.collection.get_replicas()
                    
                    info = {
                        "loaded": True,
                        "replica_count": len(replicas.groups),
                        "replicas": []
                    }
                    
                    for replica in replicas.groups:
                        replica_info = {
                            "id": replica.id,
                            "shard_count": len(replica.shards),
                            "resource_group": replica.resource_group
                        }
                        info["replicas"].append(replica_info)
                    
                    return info
                
                def set_replica_number(self, replica_number):
                    """设置副本数量"""
                    print(f"设置副本数量为 {replica_number}...")
                    
                    # 释放并重新加载
                    self.collection.release()
                    self.collection.load(replica_number=replica_number)
                    
                    # 等待加载完成
                    while not self.collection.is_loaded:
                        time.sleep(0.1)
                    
                    info = self.get_replica_info()
                    print(f"当前副本数量: {info['replica_count']}")
                    
                    return info
                
                def scale_replicas(self, target_replica_number):
                    """扩缩容副本"""
                    current_info = self.get_replica_info()
                    
                    if not current_info["loaded"]:
                        print("Collection未加载，直接加载指定副本数")
                        return self.set_replica_number(target_replica_number)
                    
                    current_count = current_info["replica_count"]
                    
                    if current_count == target_replica_number:
                        print(f"副本数量已经是 {target_replica_number}")
                        return current_info
                    
                    if current_count < target_replica_number:
                        print(f"扩容: {current_count} -> {target_replica_number}")
                    else:
                        print(f"缩容: {current_count} -> {target_replica_number}")
                    
                    return self.set_replica_number(target_replica_number)
                
                def monitor_replicas(self):
                    """监控副本状态"""
                    info = self.get_replica_info()
                    
                    if not info["loaded"]:
                        print("Collection未加载")
                        return
                    
                    print(f"\n副本监控:")
                    print(f"  副本总数: {info['replica_count']}")
                    
                    for i, replica in enumerate(info["replicas"]):
                        print(f"\n  副本 {i+1}:")
                        print(f"    ID: {replica['id']}")
                        print(f"    分片数: {replica['shard_count']}")
                        print(f"    资源组: {replica['resource_group']}")
                
                def benchmark_replicas(self, num_queries=100):
                    """测试不同副本数的性能"""
                    replica_numbers = [1, 2, 3]
                    results = []
                    
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    print(f"\n副本性能测试 ({num_queries} 次查询):\n")
                    print(f"{'副本数':>8s} {'总时间':>10s} {'QPS':>10s} {'平均延迟':>12s}")
                    print("-" * 45)
                    
                    for replica_num in replica_numbers:
                        self.set_replica_number(replica_num)
                        
                        start = time.time()
                        for _ in range(num_queries):
                            self.collection.search(
                                data=query_vector,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                        elapsed = time.time() - start
                        
                        qps = num_queries / elapsed
                        avg_latency = elapsed / num_queries * 1000
                        
                        results.append({
                            "replica_number": replica_num,
                            "total_time": elapsed,
                            "qps": qps,
                            "avg_latency": avg_latency
                        })
                        
                        print(f"{replica_num:8d} {elapsed:9.2f}s {qps:9.2f} {avg_latency:10.2f}ms")
                    
                    return results
            
            # 使用副本管理器
            manager = ReplicaManager(collection)
            
            # 获取当前副本信息
            info = manager.get_replica_info()
            print(f"当前副本信息: {info}")
            
            # 设置副本数量
            manager.set_replica_number(2)
            
            # 监控副本
            manager.monitor_replicas()
            
            # 扩容副本
            manager.scale_replicas(3)
            
            # 性能测试
            results = manager.benchmark_replicas(num_queries=50)
            
            # 找到最优配置
            best_result = max(results, key=lambda x: x["qps"])
            print(f"\n最优配置:")
            print(f"  副本数: {best_result['replica_number']}")
            print(f"  QPS: {best_result['qps']:.2f}")
            ---

02.高可用配置
    a.故障切换
        a.功能说明
            副本提供自动故障切换能力，提升系统可用性。当某个副本节点故障时，查询自动路由到其他副本。故障切换对客户端透明，无需手动干预。多副本配置可以实现零停机维护。建议至少配置2个副本保证高可用。副本分布在不同节点，避免单点故障。监控副本健康状态，及时发现问题。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from threading import Thread
            
            collection = Collection("documents")
            
            # 高可用配置类
            class HighAvailabilityConfig:
                def __init__(self, collection, min_replicas=2):
                    self.collection = collection
                    self.min_replicas = min_replicas
                    self.query_count = 0
                    self.error_count = 0
                
                def ensure_high_availability(self):
                    """确保高可用配置"""
                    if not self.collection.is_loaded:
                        print(f"加载Collection，副本数: {self.min_replicas}")
                        self.collection.load(replica_number=self.min_replicas)
                        return
                    
                    replicas = self.collection.get_replicas()
                    current_replicas = len(replicas.groups)
                    
                    if current_replicas < self.min_replicas:
                        print(f"副本数不足 ({current_replicas} < {self.min_replicas})，重新加载")
                        self.collection.release()
                        self.collection.load(replica_number=self.min_replicas)
                    else:
                        print(f"副本配置正常: {current_replicas} 个副本")
                
                def query_with_retry(self, query_vector, search_params, limit=10, max_retries=3):
                    """带重试的查询"""
                    for attempt in range(max_retries):
                        try:
                            results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=limit
                            )
                            
                            self.query_count += 1
                            return results[0]
                        
                        except Exception as e:
                            self.error_count += 1
                            print(f"查询失败 (尝试 {attempt+1}/{max_retries}): {e}")
                            
                            if attempt < max_retries - 1:
                                time.sleep(0.1 * (attempt + 1))  # 指数退避
                            else:
                                raise
                
                def health_check(self):
                    """健康检查"""
                    try:
                        replicas = self.collection.get_replicas()
                        replica_count = len(replicas.groups)
                        
                        health_status = {
                            "healthy": replica_count >= self.min_replicas,
                            "replica_count": replica_count,
                            "min_replicas": self.min_replicas,
                            "query_count": self.query_count,
                            "error_count": self.error_count,
                            "error_rate": self.error_count / self.query_count if self.query_count > 0 else 0
                        }
                        
                        return health_status
                    
                    except Exception as e:
                        return {
                            "healthy": False,
                            "error": str(e)
                        }
                
                def start_health_monitor(self, interval=10):
                    """启动健康监控"""
                    def monitor():
                        while True:
                            status = self.health_check()
                            
                            print(f"\n健康检查:")
                            print(f"  状态: {'健康' if status.get('healthy') else '异常'}")
                            print(f"  副本数: {status.get('replica_count', 'N/A')}")
                            print(f"  查询数: {status.get('query_count', 0)}")
                            print(f"  错误数: {status.get('error_count', 0)}")
                            print(f"  错误率: {status.get('error_rate', 0)*100:.2f}%")
                            
                            if not status.get('healthy'):
                                print("  警告: 副本数不足，尝试恢复...")
                                self.ensure_high_availability()
                            
                            time.sleep(interval)
                    
                    monitor_thread = Thread(target=monitor, daemon=True)
                    monitor_thread.start()
                    
                    return monitor_thread
            
            # 使用高可用配置
            ha_config = HighAvailabilityConfig(collection, min_replicas=2)
            
            # 确保高可用
            ha_config.ensure_high_availability()
            
            # 带重试的查询
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            print("\n执行查询（带重试）:")
            results = ha_config.query_with_retry(query_vector, search_params, limit=10)
            print(f"查询成功: {len(results)} 条结果")
            
            # 健康检查
            status = ha_config.health_check()
            print(f"\n健康状态: {status}")
            
            # 故障模拟测试
            print("\n故障切换测试:")
            print("  模拟副本故障...")
            
            # 这里应该模拟实际的副本故障
            # 在生产环境中，Milvus会自动处理故障切换
            
            print("  查询继续执行...")
            for i in range(10):
                try:
                    results = ha_config.query_with_retry(query_vector, search_params)
                    print(f"  查询 {i+1}: 成功")
                except Exception as e:
                    print(f"  查询 {i+1}: 失败 - {e}")
            
            final_status = ha_config.health_check()
            print(f"\n最终状态:")
            print(f"  总查询数: {final_status['query_count']}")
            print(f"  错误数: {final_status['error_count']}")
            print(f"  成功率: {(1-final_status['error_rate'])*100:.2f}%")
            ---
    b.负载均衡
        a.功能说明
            多副本自动实现负载均衡，查询请求分散到不同副本。Milvus使用轮询策略分配查询到副本。负载均衡提升系统整体吞吐量和响应速度。可以根据副本负载动态调整查询分配。监控各副本的负载情况，确保均衡分布。副本数量应该与QueryNode数量匹配。合理配置可以充分利用集群资源。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from collections import defaultdict
            import concurrent.futures
            
            collection = Collection("documents")
            collection.load(replica_number=3)
            
            # 负载均衡监控
            class LoadBalancingMonitor:
                def __init__(self, collection):
                    self.collection = collection
                    self.query_stats = defaultdict(int)
                    self.latency_stats = defaultdict(list)
                
                def query(self, query_vector, search_params, limit=10):
                    """执行查询并记录统计"""
                    start = time.time()
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit
                    )
                    
                    latency = time.time() - start
                    
                    # 记录统计（这里简化，实际应该获取实际处理的副本ID）
                    replica_id = hash(time.time()) % 3  # 模拟副本ID
                    self.query_stats[replica_id] += 1
                    self.latency_stats[replica_id].append(latency)
                    
                    return results[0]
                
                def get_load_distribution(self):
                    """获取负载分布"""
                    total_queries = sum(self.query_stats.values())
                    
                    distribution = {}
                    for replica_id, count in self.query_stats.items():
                        avg_latency = np.mean(self.latency_stats[replica_id]) if self.latency_stats[replica_id] else 0
                        
                        distribution[replica_id] = {
                            "query_count": count,
                            "percentage": count / total_queries * 100 if total_queries > 0 else 0,
                            "avg_latency": avg_latency * 1000  # ms
                        }
                    
                    return distribution
                
                def print_load_stats(self):
                    """打印负载统计"""
                    distribution = self.get_load_distribution()
                    
                    print("\n负载分布:")
                    print(f"{'副本ID':>10s} {'查询数':>10s} {'占比':>10s} {'平均延迟':>12s}")
                    print("-" * 48)
                    
                    for replica_id, stats in sorted(distribution.items()):
                        print(f"{replica_id:10d} {stats['query_count']:10d} {stats['percentage']:9.1f}% {stats['avg_latency']:10.2f}ms")
                
                def check_balance(self, threshold=0.2):
                    """检查负载是否均衡"""
                    distribution = self.get_load_distribution()
                    
                    if len(distribution) < 2:
                        return True, "副本数不足，无法判断"
                    
                    percentages = [stats["percentage"] for stats in distribution.values()]
                    avg_percentage = np.mean(percentages)
                    max_deviation = max(abs(p - avg_percentage) for p in percentages)
                    
                    is_balanced = max_deviation <= threshold * 100
                    
                    return is_balanced, f"最大偏差: {max_deviation:.1f}%"
            
            # 使用负载均衡监控
            monitor = LoadBalancingMonitor(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 执行大量查询
            print("执行负载测试...")
            for i in range(300):
                monitor.query(query_vector, search_params)
                if (i + 1) % 100 == 0:
                    print(f"  已完成 {i+1} 次查询")
            
            # 打印负载统计
            monitor.print_load_stats()
            
            # 检查负载均衡
            is_balanced, message = monitor.check_balance(threshold=0.2)
            print(f"\n负载均衡检查: {'通过' if is_balanced else '不通过'}")
            print(f"  {message}")
            
            # 并发负载测试
            print("\n并发负载测试:")
            
            def concurrent_query(monitor, query_vector, search_params):
                """并发查询任务"""
                return monitor.query(query_vector, search_params)
            
            concurrent_monitor = LoadBalancingMonitor(collection)
            
            num_concurrent = 50
            num_queries_per_thread = 10
            
            with concurrent.futures.ThreadPoolExecutor(max_workers=num_concurrent) as executor:
                futures = []
                for _ in range(num_concurrent * num_queries_per_thread):
                    future = executor.submit(concurrent_query, concurrent_monitor, query_vector, search_params)
                    futures.append(future)
                
                # 等待完成
                concurrent.futures.wait(futures)
            
            print(f"完成 {num_concurrent * num_queries_per_thread} 次并发查询")
            
            concurrent_monitor.print_load_stats()
            
            is_balanced, message = concurrent_monitor.check_balance(threshold=0.2)
            print(f"\n并发负载均衡检查: {'通过' if is_balanced else '不通过'}")
            print(f"  {message}")
            
            # 负载均衡建议
            print("\n负载均衡建议:")
            print("  1. 副本数 = QueryNode数，充分利用资源")
            print("  2. 监控各副本负载，确保均衡")
            print("  3. 副本分布在不同节点，避免热点")
            print("  4. 使用资源组隔离不同业务")
            print("  5. 定期检查负载分布，及时调整")
            ---

7.3 动态Schema

01.动态字段
    a.启用动态Schema
        a.功能说明
            动态Schema允许插入未在Schema中定义的字段，提供灵活性。启用后可以在插入数据时添加任意JSON字段。动态字段存储在特殊的$meta字段中。可以查询和过滤动态字段，但不能为其创建索引。适合字段不固定的场景，如用户自定义属性、元数据等。动态字段会略微影响性能。需要在创建Collection时启用。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建启用动态Schema的Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(
                fields=fields,
                description="动态Schema示例",
                enable_dynamic_field=True  # 启用动态字段
            )
            
            collection = Collection("dynamic_collection", schema=schema)
            
            print(f"动态Schema已启用: {schema.enable_dynamic_field}")
            
            # 插入带动态字段的数据
            data = [
                [1, 2, 3, 4, 5],  # ids
                [[np.random.random() for _ in range(128)] for _ in range(5)],  # embeddings
                # 动态字段
                [
                    {"title": "文档1", "category": "技术", "tags": ["AI", "ML"]},
                    {"title": "文档2", "author": "张三", "rating": 4.5},
                    {"title": "文档3", "category": "科学", "year": 2024},
                    {"title": "文档4", "price": 99.99, "stock": 100},
                    {"title": "文档5", "description": "这是一个测试文档"}
                ]
            ]
            
            collection.insert(data)
            collection.flush()
            
            print(f"\n插入数据: {collection.num_entities} 条")
            
            # 创建索引并加载
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 查询动态字段
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                output_fields=["id", "title", "category", "author"]  # 包含动态字段
            )
            
            print("\n查询结果（包含动态字段）:")
            for hit in results[0]:
                print(f"  ID: {hit.id}")
                print(f"  标题: {hit.entity.get('title')}")
                print(f"  类别: {hit.entity.get('category')}")
                print(f"  作者: {hit.entity.get('author')}")
                print()
            
            # 过滤动态字段
            results_filtered = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                expr='category == "技术"',  # 过滤动态字段
                output_fields=["id", "title", "category"]
            )
            
            print("过滤动态字段（category == '技术'）:")
            for hit in results_filtered[0]:
                print(f"  {hit.entity.get('title')}: {hit.entity.get('category')}")
            ---
    b.动态字段管理
        a.功能说明
            动态字段管理需要注意数据一致性和查询性能。不同记录可以有不同的动态字段。动态字段不支持索引，过滤性能较差。建议将常用字段定义在Schema中。动态字段适合低频查询的元数据。可以通过output_fields指定返回的动态字段。需要处理字段缺失的情况。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("dynamic_collection")
            collection.load()
            
            # 动态字段管理类
            class DynamicFieldManager:
                def __init__(self, collection):
                    self.collection = collection
                    self.field_usage = {}
                
                def insert_with_dynamic_fields(self, ids, embeddings, dynamic_data):
                    """插入带动态字段的数据"""
                    # 统计字段使用情况
                    for record in dynamic_data:
                        for field_name in record.keys():
                            self.field_usage[field_name] = self.field_usage.get(field_name, 0) + 1
                    
                    data = [ids, embeddings, dynamic_data]
                    self.collection.insert(data)
                    self.collection.flush()
                
                def query_dynamic_fields(self, query_vector, search_params, fields=None):
                    """查询动态字段"""
                    # 如果未指定字段，返回所有常用字段
                    if fields is None:
                        fields = self.get_common_fields(threshold=0.5)
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        output_fields=["id"] + fields
                    )
                    
                    return results[0]
                
                def get_common_fields(self, threshold=0.5):
                    """获取常用动态字段（出现频率 > threshold）"""
                    total_records = self.collection.num_entities
                    common_fields = []
                    
                    for field_name, count in self.field_usage.items():
                        if count / total_records >= threshold:
                            common_fields.append(field_name)
                    
                    return common_fields
                
                def get_field_statistics(self):
                    """获取字段统计信息"""
                    total_records = self.collection.num_entities
                    
                    stats = {}
                    for field_name, count in self.field_usage.items():
                        stats[field_name] = {
                            "count": count,
                            "coverage": count / total_records * 100 if total_records > 0 else 0
                        }
                    
                    return stats
                
                def recommend_schema_fields(self, threshold=0.8):
                    """推荐应该加入Schema的字段"""
                    stats = self.get_field_statistics()
                    recommendations = []
                    
                    for field_name, stat in stats.items():
                        if stat["coverage"] >= threshold * 100:
                            recommendations.append({
                                "field": field_name,
                                "coverage": stat["coverage"],
                                "reason": f"字段覆盖率 {stat['coverage']:.1f}%，建议加入Schema并创建索引"
                            })
                    
                    return recommendations
            
            # 使用动态字段管理器
            manager = DynamicFieldManager(collection)
            
            # 插入更多数据
            new_ids = [10, 11, 12, 13, 14]
            new_embeddings = [[np.random.random() for _ in range(128)] for _ in range(5)]
            new_dynamic_data = [
                {"title": "文档10", "category": "技术", "views": 1000},
                {"title": "文档11", "category": "科学", "views": 500},
                {"title": "文档12", "category": "技术", "views": 800},
                {"title": "文档13", "category": "艺术", "views": 300},
                {"title": "文档14", "category": "技术", "views": 1200}
            ]
            
            manager.insert_with_dynamic_fields(new_ids, new_embeddings, new_dynamic_data)
            
            # 获取字段统计
            stats = manager.get_field_statistics()
            print("\n动态字段统计:")
            for field_name, stat in sorted(stats.items(), key=lambda x: x[1]["coverage"], reverse=True):
                print(f"  {field_name}: {stat['count']} 次, 覆盖率 {stat['coverage']:.1f}%")
            
            # 获取常用字段
            common_fields = manager.get_common_fields(threshold=0.5)
            print(f"\n常用字段 (覆盖率 > 50%): {common_fields}")
            
            # 推荐Schema字段
            recommendations = manager.recommend_schema_fields(threshold=0.8)
            if recommendations:
                print("\nSchema优化建议:")
                for rec in recommendations:
                    print(f"  {rec['field']}: {rec['reason']}")
            
            # 查询动态字段
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            results = manager.query_dynamic_fields(query_vector, search_params, fields=["title", "category", "views"])
            
            print("\n查询结果:")
            for hit in results[:5]:
                print(f"  {hit.entity.get('title')}: {hit.entity.get('category')}, 浏览 {hit.entity.get('views', 'N/A')}")
            ---

7.4 时间旅行

01.时间旅行概念
    a.时间戳机制
        a.功能说明
            时间旅行允许查询历史数据状态，基于时间戳实现。Milvus为每个操作分配时间戳，记录数据变更历史。可以指定时间点查询该时刻的数据状态。适合审计、回溯分析、版本对比等场景。时间旅行不影响当前数据，只是查询视图。历史数据保留时间由配置决定，默认保留一段时间。超过保留期的历史数据会被清理。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 获取当前时间戳
            current_ts = utility.mkts_from_unixtime(time.time())
            print(f"当前时间戳: {current_ts}")
            
            # 插入初始数据
            initial_data = [
                [1, 2, 3],
                [f"文档{i}_v1" for i in [1, 2, 3]],
                [[np.random.random() for _ in range(128)] for _ in range(3)]
            ]
            
            collection.insert(initial_data)
            collection.flush()
            
            ts_after_insert = utility.mkts_from_unixtime(time.time())
            print(f"插入后时间戳: {ts_after_insert}")
            
            # 等待一段时间
            time.sleep(2)
            
            # 更新数据（通过删除和重新插入）
            collection.delete(expr="id in [1, 2]")
            
            update_data = [
                [1, 2],
                [f"文档{i}_v2" for i in [1, 2]],
                [[np.random.random() for _ in range(128)] for _ in range(2)]
            ]
            
            collection.insert(update_data)
            collection.flush()
            
            ts_after_update = utility.mkts_from_unixtime(time.time())
            print(f"更新后时间戳: {ts_after_update}")
            
            # 查询当前状态
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results_current = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                output_fields=["id", "title"]
            )
            
            print("\n当前状态查询:")
            for hit in results_current[0]:
                print(f"  ID: {hit.id}, 标题: {hit.entity.get('title')}")
            
            # 时间旅行：查询插入后、更新前的状态
            results_past = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                travel_timestamp=ts_after_insert,  # 指定历史时间点
                output_fields=["id", "title"]
            )
            
            print(f"\n历史状态查询（时间戳: {ts_after_insert}）:")
            for hit in results_past[0]:
                print(f"  ID: {hit.id}, 标题: {hit.entity.get('title')}")
            
            # 时间戳转换
            print("\n时间戳转换:")
            unix_time = time.time()
            milvus_ts = utility.mkts_from_unixtime(unix_time)
            print(f"  Unix时间: {unix_time}")
            print(f"  Milvus时间戳: {milvus_ts}")
            
            # 从时间戳转回Unix时间
            # Milvus时间戳是纳秒级，Unix时间是秒级
            unix_time_back = milvus_ts / 1000000000
            print(f"  转回Unix时间: {unix_time_back}")
            ---
    b.历史查询
        a.功能说明
            历史查询允许访问特定时间点的数据状态。通过travel_timestamp参数指定查询时间点。可以对比不同时间点的数据变化。适合数据审计、错误恢复、A/B测试等场景。历史查询性能与当前查询相当。需要注意历史数据保留策略。超过保留期的数据无法查询。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            from datetime import datetime
            
            collection = Collection("documents")
            collection.load()
            
            # 历史查询管理类
            class TimeTravelManager:
                def __init__(self, collection):
                    self.collection = collection
                    self.snapshots = {}
                
                def create_snapshot(self, name):
                    """创建快照"""
                    timestamp = utility.mkts_from_unixtime(time.time())
                    self.snapshots[name] = {
                        "timestamp": timestamp,
                        "unix_time": time.time(),
                        "datetime": datetime.now().isoformat()
                    }
                    print(f"创建快照: {name} (时间戳: {timestamp})")
                    return timestamp
                
                def list_snapshots(self):
                    """列出所有快照"""
                    print("\n快照列表:")
                    for name, info in self.snapshots.items():
                        print(f"  {name}:")
                        print(f"    时间戳: {info['timestamp']}")
                        print(f"    时间: {info['datetime']}")
                
                def query_at_snapshot(self, snapshot_name, query_vector, search_params, limit=10):
                    """在指定快照时间点查询"""
                    if snapshot_name not in self.snapshots:
                        raise ValueError(f"快照不存在: {snapshot_name}")
                    
                    timestamp = self.snapshots[snapshot_name]["timestamp"]
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        travel_timestamp=timestamp,
                        output_fields=["id", "title"]
                    )
                    
                    return results[0]
                
                def compare_snapshots(self, snapshot1, snapshot2, query_vector, search_params):
                    """对比两个快照的查询结果"""
                    results1 = self.query_at_snapshot(snapshot1, query_vector, search_params)
                    results2 = self.query_at_snapshot(snapshot2, query_vector, search_params)
                    
                    ids1 = set(hit.id for hit in results1)
                    ids2 = set(hit.id for hit in results2)
                    
                    added = ids2 - ids1
                    removed = ids1 - ids2
                    common = ids1 & ids2
                    
                    comparison = {
                        "snapshot1": snapshot1,
                        "snapshot2": snapshot2,
                        "added": list(added),
                        "removed": list(removed),
                        "common": list(common)
                    }
                    
                    return comparison
                
                def rollback_view(self, snapshot_name):
                    """回滚到指定快照（只是查询视图，不修改数据）"""
                    if snapshot_name not in self.snapshots:
                        raise ValueError(f"快照不存在: {snapshot_name}")
                    
                    timestamp = self.snapshots[snapshot_name]["timestamp"]
                    
                    print(f"\n回滚视图到快照: {snapshot_name}")
                    print(f"  时间戳: {timestamp}")
                    print(f"  时间: {self.snapshots[snapshot_name]['datetime']}")
                    
                    return timestamp
            
            # 使用时间旅行管理器
            tt_manager = TimeTravelManager(collection)
            
            # 创建初始快照
            tt_manager.create_snapshot("initial")
            
            # 插入数据
            data1 = [
                [100, 101, 102],
                ["文档A", "文档B", "文档C"],
                [[np.random.random() for _ in range(128)] for _ in range(3)]
            ]
            collection.insert(data1)
            collection.flush()
            
            time.sleep(1)
            tt_manager.create_snapshot("after_insert_1")
            
            # 插入更多数据
            data2 = [
                [103, 104],
                ["文档D", "文档E"],
                [[np.random.random() for _ in range(128)] for _ in range(2)]
            ]
            collection.insert(data2)
            collection.flush()
            
            time.sleep(1)
            tt_manager.create_snapshot("after_insert_2")
            
            # 删除数据
            collection.delete(expr="id in [100, 101]")
            collection.flush()
            
            time.sleep(1)
            tt_manager.create_snapshot("after_delete")
            
            # 列出快照
            tt_manager.list_snapshots()
            
            # 查询不同时间点
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            print("\n不同时间点的查询结果:")
            
            for snapshot_name in ["initial", "after_insert_1", "after_insert_2", "after_delete"]:
                try:
                    results = tt_manager.query_at_snapshot(snapshot_name, query_vector, search_params, limit=10)
                    print(f"\n{snapshot_name}: {len(results)} 条结果")
                    for hit in results[:3]:
                        print(f"  ID: {hit.id}, 标题: {hit.entity.get('title')}")
                except Exception as e:
                    print(f"\n{snapshot_name}: 查询失败 - {e}")
            
            # 对比快照
            comparison = tt_manager.compare_snapshots("after_insert_1", "after_delete", query_vector, search_params)
            
            print(f"\n快照对比:")
            print(f"  新增ID: {comparison['added']}")
            print(f"  删除ID: {comparison['removed']}")
            print(f"  保留ID: {comparison['common']}")
            ---

02.应用场景
    a.数据审计
        a.功能说明
            时间旅行支持数据审计，追踪数据变更历史。可以查询任意时间点的数据状态，验证数据完整性。适合合规审计、安全审查等场景。可以对比不同时间点的数据差异。帮助定位数据异常和错误操作。支持数据恢复和回滚决策。需要配置足够的历史数据保留期。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            from datetime import datetime
            
            collection = Collection("documents")
            collection.load()
            
            # 数据审计类
            class DataAuditor:
                def __init__(self, collection):
                    self.collection = collection
                    self.audit_log = []
                
                def log_operation(self, operation, details):
                    """记录操作日志"""
                    timestamp = utility.mkts_from_unixtime(time.time())
                    
                    log_entry = {
                        "timestamp": timestamp,
                        "unix_time": time.time(),
                        "datetime": datetime.now().isoformat(),
                        "operation": operation,
                        "details": details
                    }
                    
                    self.audit_log.append(log_entry)
                    print(f"[审计] {operation}: {details}")
                    
                    return timestamp
                
                def insert_with_audit(self, data):
                    """带审计的插入"""
                    timestamp_before = self.log_operation("INSERT_START", f"{len(data[0])} 条记录")
                    
                    self.collection.insert(data)
                    self.collection.flush()
                    
                    timestamp_after = self.log_operation("INSERT_COMPLETE", f"{len(data[0])} 条记录")
                    
                    return timestamp_before, timestamp_after
                
                def delete_with_audit(self, expr):
                    """带审计的删除"""
                    timestamp_before = self.log_operation("DELETE_START", expr)
                    
                    # 先查询要删除的数据
                    # 这里简化，实际应该查询并记录
                    
                    self.collection.delete(expr)
                    self.collection.flush()
                    
                    timestamp_after = self.log_operation("DELETE_COMPLETE", expr)
                    
                    return timestamp_before, timestamp_after
                
                def verify_data_integrity(self, expected_count, timestamp=None):
                    """验证数据完整性"""
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
                    
                    search_kwargs = {
                        "data": query_vector,
                        "anns_field": "embedding",
                        "param": search_params,
                        "limit": 10000
                    }
                    
                    if timestamp:
                        search_kwargs["travel_timestamp"] = timestamp
                    
                    results = self.collection.search(**search_kwargs)
                    actual_count = len(results[0])
                    
                    is_valid = actual_count >= expected_count * 0.9  # 允许10%误差
                    
                    self.log_operation(
                        "INTEGRITY_CHECK",
                        f"预期: {expected_count}, 实际: {actual_count}, 结果: {'通过' if is_valid else '失败'}"
                    )
                    
                    return is_valid, actual_count
                
                def generate_audit_report(self):
                    """生成审计报告"""
                    print("\n" + "="*60)
                    print("数据审计报告")
                    print("="*60)
                    
                    print(f"\n总操作数: {len(self.audit_log)}")
                    
                    # 按操作类型统计
                    operation_counts = {}
                    for entry in self.audit_log:
                        op = entry["operation"]
                        operation_counts[op] = operation_counts.get(op, 0) + 1
                    
                    print("\n操作统计:")
                    for op, count in sorted(operation_counts.items()):
                        print(f"  {op}: {count} 次")
                    
                    print("\n操作时间线:")
                    for entry in self.audit_log:
                        print(f"  [{entry['datetime']}] {entry['operation']}: {entry['details']}")
                    
                    return {
                        "total_operations": len(self.audit_log),
                        "operation_counts": operation_counts,
                        "audit_log": self.audit_log
                    }
                
                def rollback_analysis(self, target_timestamp):
                    """回滚分析"""
                    print(f"\n回滚分析（目标时间戳: {target_timestamp}）:")
                    
                    # 找到目标时间戳之后的操作
                    operations_to_rollback = [
                        entry for entry in self.audit_log
                        if entry["timestamp"] > target_timestamp
                    ]
                    
                    print(f"  需要回滚的操作数: {len(operations_to_rollback)}")
                    
                    for entry in operations_to_rollback:
                        print(f"    [{entry['datetime']}] {entry['operation']}: {entry['details']}")
                    
                    return operations_to_rollback
            
            # 使用数据审计器
            auditor = DataAuditor(collection)
            
            # 执行一系列操作
            print("执行审计操作:\n")
            
            # 插入数据
            data1 = [
                [200, 201, 202],
                ["审计文档A", "审计文档B", "审计文档C"],
                [[np.random.random() for _ in range(128)] for _ in range(3)]
            ]
            ts_insert1_before, ts_insert1_after = auditor.insert_with_audit(data1)
            
            time.sleep(1)
            
            # 验证完整性
            auditor.verify_data_integrity(expected_count=3, timestamp=ts_insert1_after)
            
            time.sleep(1)
            
            # 插入更多数据
            data2 = [
                [203, 204],
                ["审计文档D", "审计文档E"],
                [[np.random.random() for _ in range(128)] for _ in range(2)]
            ]
            ts_insert2_before, ts_insert2_after = auditor.insert_with_audit(data2)
            
            time.sleep(1)
            
            # 删除数据
            ts_delete_before, ts_delete_after = auditor.delete_with_audit("id in [200, 201]")
            
            time.sleep(1)
            
            # 验证完整性
            auditor.verify_data_integrity(expected_count=3)
            
            # 生成审计报告
            report = auditor.generate_audit_report()
            
            # 回滚分析
            auditor.rollback_analysis(target_timestamp=ts_insert1_after)
            
            print("\n审计应用场景:")
            print("  1. 合规审计: 追踪所有数据变更")
            print("  2. 安全审查: 发现异常操作")
            print("  3. 错误恢复: 定位问题时间点")
            print("  4. 数据验证: 验证数据完整性")
            print("  5. 回滚决策: 分析回滚影响")
            ---
    b.版本对比
        a.功能说明
            时间旅行支持版本对比，比较不同时间点的数据差异。可以对比数据内容、查询结果、统计指标等。适合A/B测试、算法对比、数据质量评估等场景。帮助理解数据演变过程。支持可视化版本差异。可以用于数据回归测试。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 版本对比类
            class VersionComparator:
                def __init__(self, collection):
                    self.collection = collection
                    self.versions = {}
                
                def create_version(self, version_name):
                    """创建版本"""
                    timestamp = utility.mkts_from_unixtime(time.time())
                    self.versions[version_name] = timestamp
                    print(f"创建版本: {version_name} (时间戳: {timestamp})")
                    return timestamp
                
                def compare_query_results(self, version1, version2, query_vector, search_params, limit=10):
                    """对比两个版本的查询结果"""
                    if version1 not in self.versions or version2 not in self.versions:
                        raise ValueError("版本不存在")
                    
                    # 查询版本1
                    results1 = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        travel_timestamp=self.versions[version1],
                        output_fields=["id", "title"]
                    )
                    
                    # 查询版本2
                    results2 = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        travel_timestamp=self.versions[version2],
                        output_fields=["id", "title"]
                    )
                    
                    # 对比结果
                    ids1 = [hit.id for hit in results1[0]]
                    ids2 = [hit.id for hit in results2[0]]
                    
                    comparison = {
                        "version1": version1,
                        "version2": version2,
                        "results1": ids1,
                        "results2": ids2,
                        "intersection": list(set(ids1) & set(ids2)),
                        "only_in_v1": list(set(ids1) - set(ids2)),
                        "only_in_v2": list(set(ids2) - set(ids1)),
                        "similarity": len(set(ids1) & set(ids2)) / max(len(ids1), len(ids2)) if max(len(ids1), len(ids2)) > 0 else 0
                    }
                    
                    return comparison
                
                def print_comparison(self, comparison):
                    """打印对比结果"""
                    print(f"\n版本对比: {comparison['version1']} vs {comparison['version2']}")
                    print(f"  相似度: {comparison['similarity']*100:.1f}%")
                    print(f"  共同结果: {len(comparison['intersection'])} 个")
                    print(f"  仅在{comparison['version1']}: {len(comparison['only_in_v1'])} 个")
                    print(f"  仅在{comparison['version2']}: {len(comparison['only_in_v2'])} 个")
                    
                    if comparison['only_in_v1']:
                        print(f"\n  仅在{comparison['version1']}的ID: {comparison['only_in_v1'][:5]}")
                    
                    if comparison['only_in_v2']:
                        print(f"  仅在{comparison['version2']}的ID: {comparison['only_in_v2'][:5]}")
            
            # 使用版本对比器
            comparator = VersionComparator(collection)
            
            # 创建版本
            comparator.create_version("v1.0")
            
            # 修改数据...
            time.sleep(1)
            
            comparator.create_version("v1.1")
            
            # 对比版本
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            comparison = comparator.compare_query_results("v1.0", "v1.1", query_vector, search_params)
            comparator.print_comparison(comparison)
            
            print("\n版本对比应用:")
            print("  1. A/B测试: 对比不同算法效果")
            print("  2. 数据质量: 评估数据变更影响")
            print("  3. 回归测试: 验证系统升级")
            print("  4. 性能分析: 对比不同配置")
            ---

7.5 混合搜索Hybrid

01.混合搜索原理
    a.多路召回
        a.功能说明
            混合搜索结合多种检索方式，提升召回效果。支持向量搜索、全文搜索、标量过滤等多路召回。不同召回路径可以使用不同的权重。通过融合算法合并多路结果。适合复杂查询场景，如语义+关键词搜索。可以提升搜索准确率和用户满意度。需要合理设计融合策略。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 多路召回类
            class MultiRecallSearch:
                def __init__(self, collection):
                    self.collection = collection
                
                def vector_recall(self, query_vector, search_params, limit=50):
                    """向量召回"""
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        output_fields=["id", "title"]
                    )
                    
                    # 转换为字典格式
                    recall_results = {}
                    for hit in results[0]:
                        recall_results[hit.id] = {
                            "score": 1 / (1 + hit.distance),  # 距离转分数
                            "title": hit.entity.get("title"),
                            "source": "vector"
                        }
                    
                    return recall_results
                
                def keyword_recall(self, keywords, limit=50):
                    """关键词召回（通过标量过滤模拟）"""
                    # 构建关键词过滤表达式
                    keyword_expr = " or ".join([f'title like "%{kw}%"' for kw in keywords])
                    
                    # 使用随机向量进行搜索，主要依赖过滤
                    query_vector = [np.random.random() for _ in range(128)]
                    
                    try:
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": 16}},
                            limit=limit,
                            expr=keyword_expr,
                            output_fields=["id", "title"]
                        )
                        
                        recall_results = {}
                        for hit in results[0]:
                            # 计算关键词匹配分数
                            title = hit.entity.get("title", "")
                            match_count = sum(1 for kw in keywords if kw in title)
                            score = match_count / len(keywords) if keywords else 0
                            
                            recall_results[hit.id] = {
                                "score": score,
                                "title": title,
                                "source": "keyword"
                            }
                        
                        return recall_results
                    
                    except Exception as e:
                        print(f"关键词召回失败: {e}")
                        return {}
                
                def category_recall(self, category, limit=50):
                    """类别召回"""
                    query_vector = [np.random.random() for _ in range(128)]
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=limit,
                        expr=f'category == "{category}"',
                        output_fields=["id", "title", "category"]
                    )
                    
                    recall_results = {}
                    for hit in results[0]:
                        recall_results[hit.id] = {
                            "score": 1.0,  # 类别匹配给固定分数
                            "title": hit.entity.get("title"),
                            "category": hit.entity.get("category"),
                            "source": "category"
                        }
                    
                    return recall_results
                
                def hybrid_recall(self, query_vector, keywords=None, category=None, weights=None):
                    """混合召回"""
                    if weights is None:
                        weights = {"vector": 0.6, "keyword": 0.3, "category": 0.1}
                    
                    all_results = {}
                    
                    # 向量召回
                    search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
                    vector_results = self.vector_recall(query_vector, search_params, limit=50)
                    
                    for doc_id, info in vector_results.items():
                        all_results[doc_id] = {
                            "scores": {"vector": info["score"]},
                            "title": info["title"],
                            "sources": ["vector"]
                        }
                    
                    # 关键词召回
                    if keywords:
                        keyword_results = self.keyword_recall(keywords, limit=50)
                        
                        for doc_id, info in keyword_results.items():
                            if doc_id in all_results:
                                all_results[doc_id]["scores"]["keyword"] = info["score"]
                                all_results[doc_id]["sources"].append("keyword")
                            else:
                                all_results[doc_id] = {
                                    "scores": {"keyword": info["score"]},
                                    "title": info["title"],
                                    "sources": ["keyword"]
                                }
                    
                    # 类别召回
                    if category:
                        category_results = self.category_recall(category, limit=50)
                        
                        for doc_id, info in category_results.items():
                            if doc_id in all_results:
                                all_results[doc_id]["scores"]["category"] = info["score"]
                                all_results[doc_id]["sources"].append("category")
                            else:
                                all_results[doc_id] = {
                                    "scores": {"category": info["score"]},
                                    "title": info["title"],
                                    "sources": ["category"]
                                }
                    
                    # 计算加权总分
                    for doc_id in all_results:
                        total_score = 0
                        for source, weight in weights.items():
                            if source in all_results[doc_id]["scores"]:
                                total_score += weight * all_results[doc_id]["scores"][source]
                        
                        all_results[doc_id]["total_score"] = total_score
                    
                    # 排序
                    sorted_results = sorted(
                        all_results.items(),
                        key=lambda x: x[1]["total_score"],
                        reverse=True
                    )
                    
                    return sorted_results[:20]
            
            # 使用多路召回
            multi_recall = MultiRecallSearch(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            keywords = ["技术", "AI"]
            category = "电子产品"
            
            print("混合召回搜索:\n")
            
            results = multi_recall.hybrid_recall(
                query_vector=query_vector,
                keywords=keywords,
                category=category,
                weights={"vector": 0.5, "keyword": 0.3, "category": 0.2}
            )
            
            print(f"{'排名':>4s} {'ID':>8s} {'总分':>8s} {'来源':>20s} {'标题':>30s}")
            print("-" * 75)
            
            for rank, (doc_id, info) in enumerate(results[:10], 1):
                sources = ", ".join(info["sources"])
                title = info["title"][:28] if info.get("title") else "N/A"
                print(f"{rank:4d} {doc_id:8d} {info['total_score']:8.3f} {sources:>20s} {title:>30s}")
            ---
    b.融合策略
        a.功能说明
            融合策略决定如何合并多路召回结果。常见策略包括加权平均、RRF、CombSUM等。权重设置影响不同召回路径的重要性。需要根据业务场景调整权重。可以使用机器学习优化融合参数。融合策略应该考虑结果的排序位置。需要实验确定最优融合方法。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 融合策略类
            class FusionStrategy:
                @staticmethod
                def weighted_sum(results_dict, weights):
                    """加权求和融合"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        weight = weights.get(source, 0)
                        
                        for doc_id, score in results.items():
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = 0
                            fused_scores[doc_id] += weight * score
                    
                    return fused_scores
                
                @staticmethod
                def rrf(results_dict, k=60):
                    """Reciprocal Rank Fusion"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        # 按分数排序获取排名
                        ranked = sorted(results.items(), key=lambda x: x[1], reverse=True)
                        
                        for rank, (doc_id, score) in enumerate(ranked):
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = 0
                            fused_scores[doc_id] += 1 / (k + rank + 1)
                    
                    return fused_scores
                
                @staticmethod
                def comb_sum(results_dict):
                    """CombSUM: 简单求和"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        for doc_id, score in results.items():
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = 0
                            fused_scores[doc_id] += score
                    
                    return fused_scores
                
                @staticmethod
                def comb_max(results_dict):
                    """CombMAX: 取最大值"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        for doc_id, score in results.items():
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = score
                            else:
                                fused_scores[doc_id] = max(fused_scores[doc_id], score)
                    
                    return fused_scores
                
                @staticmethod
                def adaptive_fusion(results_dict, quality_scores):
                    """自适应融合：根据召回质量动态调整权重"""
                    # 归一化质量分数
                    total_quality = sum(quality_scores.values())
                    weights = {
                        source: quality / total_quality
                        for source, quality in quality_scores.items()
                    }
                    
                    return FusionStrategy.weighted_sum(results_dict, weights)
            
            # 测试不同融合策略
            print("融合策略对比:\n")
            
            # 模拟多路召回结果
            vector_results = {1: 0.9, 2: 0.8, 3: 0.7, 4: 0.6, 5: 0.5}
            keyword_results = {2: 0.95, 3: 0.85, 6: 0.75, 7: 0.65}
            category_results = {1: 1.0, 4: 1.0, 8: 1.0}
            
            results_dict = {
                "vector": vector_results,
                "keyword": keyword_results,
                "category": category_results
            }
            
            # 加权求和
            weights = {"vector": 0.5, "keyword": 0.3, "category": 0.2}
            weighted_scores = FusionStrategy.weighted_sum(results_dict, weights)
            
            print("加权求和融合:")
            for doc_id, score in sorted(weighted_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # RRF
            rrf_scores = FusionStrategy.rrf(results_dict, k=60)
            
            print("\nRRF融合:")
            for doc_id, score in sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # CombSUM
            combsum_scores = FusionStrategy.comb_sum(results_dict)
            
            print("\nCombSUM融合:")
            for doc_id, score in sorted(combsum_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # CombMAX
            combmax_scores = FusionStrategy.comb_max(results_dict)
            
            print("\nCombMAX融合:")
            for doc_id, score in sorted(combmax_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # 自适应融合
            quality_scores = {"vector": 0.8, "keyword": 0.6, "category": 0.9}
            adaptive_scores = FusionStrategy.adaptive_fusion(results_dict, quality_scores)
            
            print("\n自适应融合:")
            for doc_id, score in sorted(adaptive_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            print("\n融合策略选择建议:")
            print("  加权求和: 适合权重明确的场景")
            print("  RRF: 适合不同度量的结果融合")
            print("  CombSUM: 简单快速，适合相同度量")
            print("  CombMAX: 强调最佳匹配")
            print("  自适应: 根据召回质量动态调整")
            ---

02.应用实践
    a.语义+关键词搜索
        a.功能说明
            语义+关键词混合搜索结合向量语义理解和关键词精确匹配。向量搜索捕捉语义相似性，关键词搜索保证精确匹配。适合搜索引擎、文档检索等场景。可以提升搜索准确率和用户满意度。需要合理设置两者权重。关键词匹配可以作为硬性约束或软性加分。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 语义+关键词搜索类
            class SemanticKeywordSearch:
                def __init__(self, collection):
                    self.collection = collection
                
                def search(self, query_text, query_vector, keywords=None, mode="soft"):
                    """
                    混合搜索
                    mode: "soft" (软约束，关键词加分) 或 "hard" (硬约束，必须包含关键词)
                    """
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    if mode == "hard" and keywords:
                        # 硬约束：必须包含关键词
                        keyword_expr = " or ".join([f'title like "%{kw}%"' for kw in keywords])
                        
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=20,
                            expr=keyword_expr,
                            output_fields=["id", "title"]
                        )
                        
                        return [(hit.id, hit.entity.get("title"), hit.distance) for hit in results[0]]
                    
                    else:
                        # 软约束：关键词加分
                        # 先进行向量搜索
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=50,
                            output_fields=["id", "title"]
                        )
                        
                        # 计算综合分数
                        scored_results = []
                        for hit in results[0]:
                            title = hit.entity.get("title", "")
                            
                            # 向量分数（距离转相似度）
                            vector_score = 1 / (1 + hit.distance)
                            
                            # 关键词匹配分数
                            keyword_score = 0
                            if keywords:
                                match_count = sum(1 for kw in keywords if kw in title)
                                keyword_score = match_count / len(keywords)
                            
                            # 综合分数（可调整权重）
                            total_score = 0.7 * vector_score + 0.3 * keyword_score
                            
                            scored_results.append((hit.id, title, total_score, vector_score, keyword_score))
                        
                        # 按综合分数排序
                        scored_results.sort(key=lambda x: x[2], reverse=True)
                        
                        return scored_results[:20]
            
            # 使用语义+关键词搜索
            sk_search = SemanticKeywordSearch(collection)
            
            query_text = "人工智能机器学习"
            query_vector = [np.random.random() for _ in range(128)]
            keywords = ["AI", "机器学习"]
            
            print("软约束模式（关键词加分）:\n")
            results_soft = sk_search.search(query_text, query_vector, keywords, mode="soft")
            
            print(f"{'排名':>4s} {'ID':>8s} {'总分':>8s} {'向量分':>10s} {'关键词分':>10s} {'标题':>30s}")
            print("-" * 75)
            
            for rank, (doc_id, title, total, vector, keyword) in enumerate(results_soft[:10], 1):
                title_short = title[:28] if title else "N/A"
                print(f"{rank:4d} {doc_id:8d} {total:8.3f} {vector:10.3f} {keyword:10.3f} {title_short:>30s}")
            
            print("\n硬约束模式（必须包含关键词）:\n")
            results_hard = sk_search.search(query_text, query_vector, keywords, mode="hard")
            
            print(f"{'排名':>4s} {'ID':>8s} {'距离':>10s} {'标题':>40s}")
            print("-" * 65)
            
            for rank, (doc_id, title, distance) in enumerate(results_hard[:10], 1):
                title_short = title[:38] if title else "N/A"
                print(f"{rank:4d} {doc_id:8d} {distance:10.4f} {title_short:>40s}")
            ---
    b.多模态搜索
        a.功能说明
            多模态搜索结合文本、图像、音频等多种模态。每种模态使用对应的向量编码器。可以实现跨模态检索，如用文本搜索图像。适合电商、视频平台等多媒体场景。需要为不同模态创建不同的向量字段。融合策略需要考虑模态间的权重。可以提供更丰富的搜索体验。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建多模态Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="text_embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=512)
            ]
            
            schema = CollectionSchema(fields=fields, description="多模态搜索")
            multimodal_collection = Collection("multimodal_search", schema=schema)
            
            # 插入多模态数据
            data = [
                list(range(100)),  # ids
                [f"商品{i}" for i in range(100)],  # titles
                [[np.random.random() for _ in range(768)] for _ in range(100)],  # text embeddings
                [[np.random.random() for _ in range(512)] for _ in range(100)]   # image embeddings
            ]
            
            multimodal_collection.insert(data)
            multimodal_collection.flush()
            
            # 创建索引
            text_index = {
                "index_type": "IVF_FLAT",
                "metric_type": "COSINE",
                "params": {"nlist": 128}
            }
            
            image_index = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            
            multimodal_collection.create_index("text_embedding", text_index)
            multimodal_collection.create_index("image_embedding", image_index)
            
            multimodal_collection.load()
            
            # 多模态搜索类
            class MultimodalSearch:
                def __init__(self, collection):
                    self.collection = collection
                
                def text_search(self, text_vector, limit=50):
                    """文本模态搜索"""
                    results = self.collection.search(
                        data=[text_vector],
                        anns_field="text_embedding",
                        param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                        limit=limit,
                        output_fields=["id", "title"]
                    )
                    
                    return {hit.id: hit.distance for hit in results[0]}
                
                def image_search(self, image_vector, limit=50):
                    """图像模态搜索"""
                    results = self.collection.search(
                        data=[image_vector],
                        anns_field="image_embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=limit,
                        output_fields=["id", "title"]
                    )
                    
                    return {hit.id: hit.distance for hit in results[0]}
                
                def multimodal_search(self, text_vector=None, image_vector=None, weights=None):
                    """多模态融合搜索"""
                    if weights is None:
                        weights = {"text": 0.5, "image": 0.5}
                    
                    results_dict = {}
                    
                    # 文本搜索
                    if text_vector is not None:
                        text_results = self.text_search(text_vector)
                        
                        for doc_id, distance in text_results.items():
                            score = 1 / (1 + distance)  # 转换为相似度分数
                            results_dict[doc_id] = {"text": score}
                    
                    # 图像搜索
                    if image_vector is not None:
                        image_results = self.image_search(image_vector)
                        
                        # 归一化L2距离
                        max_dist = max(image_results.values()) if image_results else 1.0
                        
                        for doc_id, distance in image_results.items():
                            score = 1 - (distance / max_dist)
                            
                            if doc_id in results_dict:
                                results_dict[doc_id]["image"] = score
                            else:
                                results_dict[doc_id] = {"image": score}
                    
                    # 计算加权总分
                    final_scores = {}
                    for doc_id, scores in results_dict.items():
                        total = 0
                        for modality, weight in weights.items():
                            if modality in scores:
                                total += weight * scores[modality]
                        
                        final_scores[doc_id] = total
                    
                    # 排序
                    sorted_results = sorted(final_scores.items(), key=lambda x: x[1], reverse=True)
                    
                    return sorted_results[:20]
            
            # 使用多模态搜索
            mm_search = MultimodalSearch(multimodal_collection)
            
            text_query = [np.random.random() for _ in range(768)]
            image_query = [np.random.random() for _ in range(512)]
            
            print("多模态搜索结果:\n")
            
            # 纯文本搜索
            print("纯文本搜索:")
            text_only = mm_search.multimodal_search(text_vector=text_query, weights={"text": 1.0})
            for rank, (doc_id, score) in enumerate(text_only[:5], 1):
                print(f"  {rank}. 文档{doc_id}: {score:.3f}")
            
            # 纯图像搜索
            print("\n纯图像搜索:")
            image_only = mm_search.multimodal_search(image_vector=image_query, weights={"image": 1.0})
            for rank, (doc_id, score) in enumerate(image_only[:5], 1):
                print(f"  {rank}. 文档{doc_id}: {score:.3f}")
            
            # 多模态融合
            print("\n多模态融合搜索 (文本:图像 = 0.6:0.4):")
            multimodal_results = mm_search.multimodal_search(
                text_vector=text_query,
                image_vector=image_query,
                weights={"text": 0.6, "image": 0.4}
            )
            for rank, (doc_id, score) in enumerate(multimodal_results[:5], 1):
                print(f"  {rank}. 文档{doc_id}: {score:.3f}")
            
            print("\n多模态搜索应用:")
            print("  1. 电商: 图文结合商品搜索")
            print("  2. 视频: 文本搜索视频内容")
            print("  3. 社交: 跨模态内容推荐")
            print("  4. 教育: 多媒体资源检索")
            ---

8 性能优化

8.1 索引选择策略

01.索引类型对比
    a.FLAT索引
        a.功能说明
            FLAT索引是暴力搜索索引，不进行任何压缩或近似。提供100%召回率，结果最精确。适合小规模数据集（<100万向量）。查询速度随数据量线性增长。不需要训练，插入速度快。内存占用等于原始向量大小。适合对准确率要求极高的场景。数据量大时性能较差。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # FLAT索引配置
            flat_index = {
                "index_type": "FLAT",
                "metric_type": "L2",
                "params": {}  # FLAT索引无需参数
            }
            
            print("创建FLAT索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=flat_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试查询性能
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {}
            }
            
            start = time.time()
            for _ in range(100):
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            query_time = time.time() - start
            
            print(f"\n100次查询总时间: {query_time:.2f}s")
            print(f"平均查询延迟: {query_time/100*1000:.2f}ms")
            print(f"QPS: {100/query_time:.2f}")
            
            print("\nFLAT索引特点:")
            print("  优点: 100%召回率，最精确")
            print("  缺点: 查询速度慢，不适合大规模数据")
            print("  适用: <100万向量，高精度要求")
            ---
    b.IVF系列索引
        a.功能说明
            IVF系列索引使用倒排文件结构，将向量空间划分为多个聚类。包括IVF_FLAT、IVF_SQ8、IVF_PQ等变体。通过nlist参数控制聚类数量，nprobe控制搜索的聚类数。平衡了查询速度和召回率。适合中大规模数据集（100万-1000万向量）。需要训练阶段，构建时间较长。内存占用可通过量化降低。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # IVF_FLAT索引配置
            ivf_flat_index = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}  # 聚类数量
            }
            
            print("创建IVF_FLAT索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=ivf_flat_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试不同nprobe值的性能
            query_vector = [[np.random.random() for _ in range(128)]]
            
            nprobe_values = [1, 8, 16, 32, 64]
            
            print(f"\n{'nprobe':>8s} {'查询时间':>10s} {'QPS':>10s}")
            print("-" * 32)
            
            for nprobe in nprobe_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                start = time.time()
                for _ in range(100):
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                query_time = time.time() - start
                
                qps = 100 / query_time
                print(f"{nprobe:8d} {query_time:9.2f}s {qps:9.2f}")
            
            print("\nIVF索引特点:")
            print("  优点: 速度快，内存可控")
            print("  缺点: 需要训练，召回率<100%")
            print("  适用: 100万-1000万向量")
            print("  调优: nlist=4*sqrt(n), nprobe=nlist的1-10%")
            ---

02.索引选择决策
    a.数据规模评估
        a.功能说明
            根据数据规模选择合适的索引类型。小规模（<10万）使用FLAT，中规模（10万-1000万）使用IVF系列，大规模（>1000万）使用HNSW或DiskANN。需要考虑数据增长趋势，预留性能空间。评估内存资源，选择合适的压缩方式。考虑查询QPS需求，平衡速度和精度。定期重新评估，根据业务变化调整索引。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            # 索引选择决策类
            class IndexSelector:
                def __init__(self, collection):
                    self.collection = collection
                
                def recommend_index(self, vector_count, qps_requirement, recall_requirement, memory_limit_gb):
                    """
                    推荐索引类型
                    
                    参数:
                        vector_count: 向量数量
                        qps_requirement: QPS需求
                        recall_requirement: 召回率要求 (0-1)
                        memory_limit_gb: 内存限制(GB)
                    """
                    recommendations = []
                    
                    # 计算向量维度和内存需求
                    # 假设128维float32向量，每个向量512字节
                    vector_size_bytes = 128 * 4
                    total_memory_gb = vector_count * vector_size_bytes / (1024**3)
                    
                    print(f"\n数据规模评估:")
                    print(f"  向量数量: {vector_count:,}")
                    print(f"  原始数据大小: {total_memory_gb:.2f} GB")
                    print(f"  QPS需求: {qps_requirement}")
                    print(f"  召回率要求: {recall_requirement*100:.0f}%")
                    print(f"  内存限制: {memory_limit_gb} GB")
                    
                    # 小规模数据
                    if vector_count < 100000:
                        if recall_requirement >= 0.99:
                            recommendations.append({
                                "index_type": "FLAT",
                                "reason": "小规模数据，高召回率要求",
                                "params": {},
                                "expected_recall": 1.0,
                                "expected_qps": "100-500",
                                "memory_gb": total_memory_gb
                            })
                        else:
                            recommendations.append({
                                "index_type": "IVF_FLAT",
                                "reason": "小规模数据，可接受近似搜索",
                                "params": {"nlist": 128},
                                "expected_recall": 0.95,
                                "expected_qps": "500-2000",
                                "memory_gb": total_memory_gb * 1.1
                            })
                    
                    # 中规模数据
                    elif vector_count < 10000000:
                        nlist = int(4 * np.sqrt(vector_count))
                        
                        if memory_limit_gb >= total_memory_gb:
                            recommendations.append({
                                "index_type": "IVF_FLAT",
                                "reason": "中规模数据，内存充足",
                                "params": {"nlist": nlist},
                                "expected_recall": 0.95,
                                "expected_qps": "1000-5000",
                                "memory_gb": total_memory_gb * 1.1
                            })
                        
                        if memory_limit_gb < total_memory_gb * 0.5:
                            recommendations.append({
                                "index_type": "IVF_SQ8",
                                "reason": "中规模数据，内存受限",
                                "params": {"nlist": nlist},
                                "expected_recall": 0.90,
                                "expected_qps": "2000-8000",
                                "memory_gb": total_memory_gb * 0.3
                            })
                        
                        if qps_requirement > 5000:
                            recommendations.append({
                                "index_type": "HNSW",
                                "reason": "高QPS需求",
                                "params": {"M": 16, "efConstruction": 200},
                                "expected_recall": 0.95,
                                "expected_qps": "5000-20000",
                                "memory_gb": total_memory_gb * 1.3
                            })
                    
                    # 大规模数据
                    else:
                        recommendations.append({
                            "index_type": "HNSW",
                            "reason": "大规模数据，高性能需求",
                            "params": {"M": 16, "efConstruction": 200},
                            "expected_recall": 0.95,
                            "expected_qps": "5000-20000",
                            "memory_gb": total_memory_gb * 1.3
                        })
                        
                        if memory_limit_gb < total_memory_gb:
                            recommendations.append({
                                "index_type": "IVF_PQ",
                                "reason": "大规模数据，内存受限",
                                "params": {"nlist": 4096, "m": 16},
                                "expected_recall": 0.85,
                                "expected_qps": "3000-10000",
                                "memory_gb": total_memory_gb * 0.1
                            })
                    
                    return recommendations
                
                def print_recommendations(self, recommendations):
                    """打印推荐结果"""
                    print(f"\n索引推荐 (共{len(recommendations)}个选项):\n")
                    
                    for i, rec in enumerate(recommendations, 1):
                        print(f"{i}. {rec['index_type']}")
                        print(f"   原因: {rec['reason']}")
                        print(f"   参数: {rec['params']}")
                        print(f"   预期召回率: {rec['expected_recall']*100:.0f}%")
                        print(f"   预期QPS: {rec['expected_qps']}")
                        print(f"   内存需求: {rec['memory_gb']:.2f} GB")
                        print()
            
            # 使用索引选择器
            collection = Collection("documents")
            selector = IndexSelector(collection)
            
            # 场景1: 小规模高精度
            print("="*60)
            print("场景1: 小规模高精度")
            print("="*60)
            recs = selector.recommend_index(
                vector_count=50000,
                qps_requirement=200,
                recall_requirement=0.99,
                memory_limit_gb=10
            )
            selector.print_recommendations(recs)
            
            # 场景2: 中规模平衡
            print("="*60)
            print("场景2: 中规模平衡")
            print("="*60)
            recs = selector.recommend_index(
                vector_count=5000000,
                qps_requirement=3000,
                recall_requirement=0.95,
                memory_limit_gb=50
            )
            selector.print_recommendations(recs)
            
            # 场景3: 大规模内存受限
            print("="*60)
            print("场景3: 大规模内存受限")
            print("="*60)
            recs = selector.recommend_index(
                vector_count=50000000,
                qps_requirement=5000,
                recall_requirement=0.90,
                memory_limit_gb=20
            )
            selector.print_recommendations(recs)
            ---
    b.性能测试对比
        a.功能说明
            通过性能测试对比不同索引的实际表现。测试指标包括构建时间、查询延迟、QPS、召回率、内存占用等。使用真实数据和查询模式进行测试。对比不同参数配置的影响。测试结果指导索引选择和参数调优。定期进行性能回归测试。建立性能基准，监控性能变化。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 性能测试类
            class IndexBenchmark:
                def __init__(self, collection):
                    self.collection = collection
                    self.results = []
                
                def benchmark_index(self, index_config, search_params, num_queries=100):
                    """测试单个索引配置"""
                    index_type = index_config["index_type"]
                    
                    print(f"\n测试索引: {index_type}")
                    print(f"  参数: {index_config.get('params', {})}")
                    
                    # 删除现有索引
                    try:
                        self.collection.release()
                        self.collection.drop_index()
                    except:
                        pass
                    
                    # 构建索引
                    print("  构建索引...")
                    start = time.time()
                    self.collection.create_index(field_name="embedding", index_params=index_config)
                    build_time = time.time() - start
                    
                    # 加载Collection
                    self.collection.load()
                    
                    # 查询测试
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                    
                    latencies = []
                    for query_vector in query_vectors:
                        start = time.time()
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=10
                        )
                        latency = time.time() - start
                        latencies.append(latency)
                    
                    # 统计结果
                    avg_latency = np.mean(latencies) * 1000  # ms
                    p95_latency = np.percentile(latencies, 95) * 1000
                    p99_latency = np.percentile(latencies, 99) * 1000
                    qps = 1 / np.mean(latencies)
                    
                    # 内存占用（简化）
                    memory_usage = "N/A"
                    
                    result = {
                        "index_type": index_type,
                        "params": index_config.get("params", {}),
                        "build_time": build_time,
                        "avg_latency": avg_latency,
                        "p95_latency": p95_latency,
                        "p99_latency": p99_latency,
                        "qps": qps,
                        "memory": memory_usage
                    }
                    
                    self.results.append(result)
                    
                    print(f"  构建时间: {build_time:.2f}s")
                    print(f"  平均延迟: {avg_latency:.2f}ms")
                    print(f"  P95延迟: {p95_latency:.2f}ms")
                    print(f"  P99延迟: {p99_latency:.2f}ms")
                    print(f"  QPS: {qps:.2f}")
                    
                    return result
                
                def compare_indexes(self, index_configs, search_params_list, num_queries=100):
                    """对比多个索引配置"""
                    print("="*80)
                    print("索引性能对比测试")
                    print("="*80)
                    
                    for index_config, search_params in zip(index_configs, search_params_list):
                        self.benchmark_index(index_config, search_params, num_queries)
                    
                    # 打印对比表格
                    print(f"\n{'索引类型':>15s} {'构建时间':>10s} {'平均延迟':>10s} {'P95延迟':>10s} {'QPS':>10s}")
                    print("-" * 60)
                    
                    for result in self.results:
                        print(f"{result['index_type']:>15s} {result['build_time']:9.2f}s {result['avg_latency']:9.2f}ms {result['p95_latency']:9.2f}ms {result['qps']:9.2f}")
                    
                    # 推荐最佳配置
                    best_qps = max(self.results, key=lambda x: x["qps"])
                    best_latency = min(self.results, key=lambda x: x["avg_latency"])
                    
                    print(f"\n推荐:")
                    print(f"  最高QPS: {best_qps['index_type']} ({best_qps['qps']:.2f})")
                    print(f"  最低延迟: {best_latency['index_type']} ({best_latency['avg_latency']:.2f}ms)")
            
            # 使用性能测试
            benchmark = IndexBenchmark(collection)
            
            # 定义测试配置
            index_configs = [
                {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                },
                {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 128}
                },
                {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 512}
                },
                {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {"M": 16, "efConstruction": 200}
                }
            ]
            
            search_params_list = [
                {"metric_type": "L2", "params": {}},
                {"metric_type": "L2", "params": {"nprobe": 16}},
                {"metric_type": "L2", "params": {"nprobe": 64}},
                {"metric_type": "L2", "params": {"ef": 64}}
            ]
            
            # 执行对比测试
            benchmark.compare_indexes(index_configs, search_params_list, num_queries=50)
            ---

8.2 查询参数调优

01.搜索参数优化
    a.nprobe参数调优
        a.功能说明
            nprobe控制IVF索引搜索的聚类数量，直接影响召回率和查询速度。nprobe越大，召回率越高，但查询速度越慢。推荐值为nlist的1-10%。需要根据业务场景平衡精度和性能。可以通过A/B测试确定最优值。不同查询可以使用不同的nprobe值。高优先级查询使用更大的nprobe。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # nprobe调优类
            class NprobeOptimizer:
                def __init__(self, collection):
                    self.collection = collection
                
                def test_nprobe_values(self, query_vector, nprobe_values, num_queries=100):
                    """测试不同nprobe值的性能"""
                    results = []
                    
                    print(f"\nnprobe参数调优测试 ({num_queries}次查询):\n")
                    print(f"{'nprobe':>8s} {'平均延迟':>12s} {'P95延迟':>12s} {'QPS':>10s} {'召回率估计':>12s}")
                    print("-" * 60)
                    
                    # 获取基准结果（使用最大nprobe）
                    baseline_search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": max(nprobe_values)}
                    }
                    
                    baseline_results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=baseline_search_params,
                        limit=10
                    )
                    baseline_ids = set(hit.id for hit in baseline_results[0])
                    
                    for nprobe in nprobe_values:
                        search_params = {
                            "metric_type": "L2",
                            "params": {"nprobe": nprobe}
                        }
                        
                        latencies = []
                        recall_sum = 0
                        
                        for _ in range(num_queries):
                            start = time.time()
                            search_results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                            latency = time.time() - start
                            latencies.append(latency)
                            
                            # 计算召回率
                            result_ids = set(hit.id for hit in search_results[0])
                            recall = len(result_ids & baseline_ids) / len(baseline_ids)
                            recall_sum += recall
                        
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        qps = 1 / np.mean(latencies)
                        avg_recall = recall_sum / num_queries
                        
                        results.append({
                            "nprobe": nprobe,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency,
                            "qps": qps,
                            "recall": avg_recall
                        })
                        
                        print(f"{nprobe:8d} {avg_latency:11.2f}ms {p95_latency:11.2f}ms {qps:9.2f} {avg_recall*100:11.1f}%")
                    
                    return results
                
                def recommend_nprobe(self, results, min_recall=0.95):
                    """推荐最优nprobe值"""
                    # 找到满足召回率要求的最小nprobe
                    valid_results = [r for r in results if r["recall"] >= min_recall]
                    
                    if not valid_results:
                        print(f"\n警告: 没有配置满足{min_recall*100:.0f}%召回率要求")
                        return None
                    
                    best = min(valid_results, key=lambda x: x["avg_latency"])
                    
                    print(f"\n推荐配置 (召回率≥{min_recall*100:.0f}%):")
                    print(f"  nprobe: {best['nprobe']}")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  QPS: {best['qps']:.2f}")
                    print(f"  召回率: {best['recall']*100:.1f}%")
                    
                    return best
                
                def adaptive_nprobe(self, query_priority):
                    """根据查询优先级自适应选择nprobe"""
                    # 高优先级: nprobe更大，召回率更高
                    # 低优先级: nprobe更小，速度更快
                    
                    nprobe_map = {
                        "high": 64,      # 高优先级
                        "medium": 32,    # 中优先级
                        "low": 16        # 低优先级
                    }
                    
                    return nprobe_map.get(query_priority, 32)
            
            # 使用nprobe优化器
            optimizer = NprobeOptimizer(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            nprobe_values = [8, 16, 32, 64, 128, 256]
            
            # 测试不同nprobe值
            results = optimizer.test_nprobe_values(query_vector, nprobe_values, num_queries=50)
            
            # 推荐最优配置
            best_config = optimizer.recommend_nprobe(results, min_recall=0.95)
            
            # 自适应nprobe示例
            print("\n自适应nprobe策略:")
            for priority in ["high", "medium", "low"]:
                nprobe = optimizer.adaptive_nprobe(priority)
                print(f"  {priority}优先级: nprobe={nprobe}")
            ---
    b.ef参数调优
        a.功能说明
            ef参数用于HNSW索引，控制搜索时的候选集大小。ef越大，召回率越高，但查询速度越慢。ef必须大于等于limit（返回结果数）。推荐值为limit的2-10倍。efConstruction是构建时参数，ef是查询时参数。可以动态调整ef值，无需重建索引。需要根据召回率要求选择合适的ef值。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建HNSW索引
            index_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,
                    "efConstruction": 200
                }
            }
            
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # ef参数调优类
            class EfOptimizer:
                def __init__(self, collection):
                    self.collection = collection
                
                def test_ef_values(self, query_vector, ef_values, limit=10, num_queries=100):
                    """测试不同ef值的性能"""
                    results = []
                    
                    print(f"\nef参数调优测试 (limit={limit}, {num_queries}次查询):\n")
                    print(f"{'ef':>6s} {'平均延迟':>12s} {'P95延迟':>12s} {'QPS':>10s} {'召回率估计':>12s}")
                    print("-" * 58)
                    
                    # 获取基准结果
                    baseline_search_params = {
                        "metric_type": "L2",
                        "params": {"ef": max(ef_values)}
                    }
                    
                    baseline_results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=baseline_search_params,
                        limit=limit
                    )
                    baseline_ids = set(hit.id for hit in baseline_results[0])
                    
                    for ef in ef_values:
                        if ef < limit:
                            print(f"{ef:6d} 跳过 (ef必须≥limit={limit})")
                            continue
                        
                        search_params = {
                            "metric_type": "L2",
                            "params": {"ef": ef}
                        }
                        
                        latencies = []
                        recall_sum = 0
                        
                        for _ in range(num_queries):
                            start = time.time()
                            search_results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=limit
                            )
                            latency = time.time() - start
                            latencies.append(latency)
                            
                            result_ids = set(hit.id for hit in search_results[0])
                            recall = len(result_ids & baseline_ids) / len(baseline_ids)
                            recall_sum += recall
                        
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        qps = 1 / np.mean(latencies)
                        avg_recall = recall_sum / num_queries
                        
                        results.append({
                            "ef": ef,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency,
                            "qps": qps,
                            "recall": avg_recall
                        })
                        
                        print(f"{ef:6d} {avg_latency:11.2f}ms {p95_latency:11.2f}ms {qps:9.2f} {avg_recall*100:11.1f}%")
                    
                    return results
                
                def recommend_ef(self, results, limit, min_recall=0.95):
                    """推荐最优ef值"""
                    valid_results = [r for r in results if r["recall"] >= min_recall]
                    
                    if not valid_results:
                        print(f"\n警告: 没有配置满足{min_recall*100:.0f}%召回率要求")
                        return None
                    
                    best = min(valid_results, key=lambda x: x["avg_latency"])
                    
                    print(f"\n推荐配置 (limit={limit}, 召回率≥{min_recall*100:.0f}%):")
                    print(f"  ef: {best['ef']} (约{best['ef']/limit:.1f}倍limit)")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  QPS: {best['qps']:.2f}")
                    print(f"  召回率: {best['recall']*100:.1f}%")
                    
                    return best
            
            # 使用ef优化器
            ef_optimizer = EfOptimizer(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            
            # 测试不同limit下的ef值
            for limit in [10, 50, 100]:
                ef_values = [limit, limit*2, limit*4, limit*8, limit*10]
                
                print(f"\n{'='*60}")
                print(f"测试limit={limit}")
                print(f"{'='*60}")
                
                results = ef_optimizer.test_ef_values(query_vector, ef_values, limit=limit, num_queries=50)
                ef_optimizer.recommend_ef(results, limit=limit, min_recall=0.95)
            
            print("\nef参数调优建议:")
            print("  1. ef ≥ limit (必须)")
            print("  2. ef = limit * 2-4 (平衡)")
            print("  3. ef = limit * 8-10 (高召回)")
            print("  4. 根据召回率要求调整")
            print("  5. 可以动态调整，无需重建索引")
            ---

02.批量查询优化
    a.批量大小调整
        a.功能说明
            批量查询可以提升吞吐量，减少网络开销。批量大小影响延迟和吞吐量的平衡。批量过大会增加单次查询延迟。批量过小无法充分利用并行能力。推荐批量大小为10-100。需要根据硬件资源和业务需求调整。高吞吐场景使用更大批量。低延迟场景使用更小批量。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 批量查询优化类
            class BatchQueryOptimizer:
                def __init__(self, collection):
                    self.collection = collection
                
                def test_batch_sizes(self, batch_sizes, total_queries=1000):
                    """测试不同批量大小的性能"""
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    print(f"\n批量大小优化测试 (总查询数={total_queries}):\n")
                    print(f"{'批量大小':>10s} {'总时间':>10s} {'吞吐量':>12s} {'平均延迟':>12s} {'P95延迟':>12s}")
                    print("-" * 62)
                    
                    results = []
                    
                    for batch_size in batch_sizes:
                        num_batches = total_queries // batch_size
                        
                        total_time = 0
                        latencies = []
                        
                        for _ in range(num_batches):
                            # 生成批量查询向量
                            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                            
                            start = time.time()
                            results_batch = self.collection.search(
                                data=query_vectors,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                            elapsed = time.time() - start
                            
                            total_time += elapsed
                            latencies.append(elapsed)
                        
                        throughput = total_queries / total_time
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        
                        results.append({
                            "batch_size": batch_size,
                            "total_time": total_time,
                            "throughput": throughput,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency
                        })
                        
                        print(f"{batch_size:10d} {total_time:9.2f}s {throughput:11.2f}qps {avg_latency:11.2f}ms {p95_latency:11.2f}ms")
                    
                    return results
                
                def recommend_batch_size(self, results, max_latency_ms=None):
                    """推荐最优批量大小"""
                    if max_latency_ms:
                        # 满足延迟要求的最大吞吐量
                        valid_results = [r for r in results if r["avg_latency"] <= max_latency_ms]
                        
                        if not valid_results:
                            print(f"\n警告: 没有配置满足{max_latency_ms}ms延迟要求")
                            return None
                        
                        best = max(valid_results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (延迟≤{max_latency_ms}ms):")
                    else:
                        # 最大吞吐量
                        best = max(results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (最大吞吐量):")
                    
                    print(f"  批量大小: {best['batch_size']}")
                    print(f"  吞吐量: {best['throughput']:.2f} qps")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  P95延迟: {best['p95_latency']:.2f}ms")
                    
                    return best
            
            # 使用批量查询优化器
            batch_optimizer = BatchQueryOptimizer(collection)
            
            batch_sizes = [1, 10, 20, 50, 100, 200]
            
            # 测试不同批量大小
            results = batch_optimizer.test_batch_sizes(batch_sizes, total_queries=1000)
            
            # 推荐最大吞吐量配置
            batch_optimizer.recommend_batch_size(results)
            
            # 推荐满足延迟要求的配置
            batch_optimizer.recommend_batch_size(results, max_latency_ms=50)
            
            print("\n批量查询优化建议:")
            print("  1. 高吞吐场景: 批量50-200")
            print("  2. 低延迟场景: 批量1-20")
            print("  3. 平衡场景: 批量20-50")
            print("  4. 监控延迟和吞吐量指标")
            print("  5. 根据硬件资源动态调整")
            ---
    b.并发控制
        a.功能说明
            并发查询可以提升系统吞吐量，充分利用资源。并发数影响延迟和资源使用。并发过高会导致资源竞争和延迟增加。并发过低无法充分利用硬件能力。推荐并发数为CPU核心数的2-4倍。需要监控系统负载，避免过载。可以使用连接池管理并发连接。实现请求限流和熔断机制。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from concurrent.futures import ThreadPoolExecutor, as_completed
            
            collection = Collection("documents")
            collection.load()
            
            # 并发控制类
            class ConcurrencyController:
                def __init__(self, collection):
                    self.collection = collection
                
                def single_query(self, query_id):
                    """单个查询任务"""
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    start = time.time()
                    results = self.collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    latency = time.time() - start
                    
                    return query_id, latency
                
                def test_concurrency(self, concurrency_levels, num_queries=1000):
                    """测试不同并发级别的性能"""
                    print(f"\n并发控制测试 (总查询数={num_queries}):\n")
                    print(f"{'并发数':>8s} {'总时间':>10s} {'吞吐量':>12s} {'平均延迟':>12s} {'P95延迟':>12s}")
                    print("-" * 60)
                    
                    results = []
                    
                    for concurrency in concurrency_levels:
                        latencies = []
                        
                        start = time.time()
                        
                        with ThreadPoolExecutor(max_workers=concurrency) as executor:
                            futures = [executor.submit(self.single_query, i) for i in range(num_queries)]
                            
                            for future in as_completed(futures):
                                query_id, latency = future.result()
                                latencies.append(latency)
                        
                        total_time = time.time() - start
                        throughput = num_queries / total_time
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        
                        results.append({
                            "concurrency": concurrency,
                            "total_time": total_time,
                            "throughput": throughput,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency
                        })
                        
                        print(f"{concurrency:8d} {total_time:9.2f}s {throughput:11.2f}qps {avg_latency:11.2f}ms {p95_latency:11.2f}ms")
                    
                    return results
                
                def recommend_concurrency(self, results, max_latency_ms=None):
                    """推荐最优并发数"""
                    if max_latency_ms:
                        valid_results = [r for r in results if r["p95_latency"] <= max_latency_ms]
                        
                        if not valid_results:
                            print(f"\n警告: 没有配置满足P95延迟≤{max_latency_ms}ms要求")
                            return None
                        
                        best = max(valid_results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (P95延迟≤{max_latency_ms}ms):")
                    else:
                        best = max(results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (最大吞吐量):")
                    
                    print(f"  并发数: {best['concurrency']}")
                    print(f"  吞吐量: {best['throughput']:.2f} qps")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  P95延迟: {best['p95_latency']:.2f}ms")
                    
                    return best
            
            # 使用并发控制器
            concurrency_controller = ConcurrencyController(collection)
            
            concurrency_levels = [1, 2, 4, 8, 16, 32, 64]
            
            # 测试不同并发级别
            results = concurrency_controller.test_concurrency(concurrency_levels, num_queries=500)
            
            # 推荐最大吞吐量配置
            concurrency_controller.recommend_concurrency(results)
            
            # 推荐满足延迟要求的配置
            concurrency_controller.recommend_concurrency(results, max_latency_ms=100)
            
            print("\n并发控制建议:")
            print("  1. 并发数 = CPU核心数 * 2-4")
            print("  2. 监控CPU和内存使用率")
            print("  3. 避免过度并发导致资源竞争")
            print("  4. 实现请求限流机制")
            print("  5. 使用连接池管理连接")
            ---

8.3 内存优化

01.内存使用分析
    a.内存占用评估
        a.功能说明
            内存是Milvus性能的关键资源，需要合理评估和管理。内存主要用于存储向量数据、索引结构、查询缓存等。不同索引类型内存占用差异很大。FLAT索引内存占用最大，PQ索引内存占用最小。需要监控内存使用情况，避免OOM。可以通过量化、压缩等技术降低内存占用。合理配置内存限制和缓存策略。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            
            collection = Collection("documents")
            
            # 内存分析类
            class MemoryAnalyzer:
                def __init__(self, collection):
                    self.collection = collection
                
                def estimate_memory_usage(self, vector_count, vector_dim, index_type):
                    """估算内存使用"""
                    # 单个向量大小（float32）
                    vector_size_bytes = vector_dim * 4
                    
                    # 原始数据大小
                    raw_data_mb = vector_count * vector_size_bytes / (1024**2)
                    
                    # 索引开销系数
                    index_overhead = {
                        "FLAT": 1.0,        # 无额外开销
                        "IVF_FLAT": 1.1,    # 10%开销
                        "IVF_SQ8": 0.3,     # 压缩到30%
                        "IVF_PQ": 0.1,      # 压缩到10%
                        "HNSW": 1.3,        # 30%开销
                    }
                    
                    overhead_factor = index_overhead.get(index_type, 1.0)
                    total_memory_mb = raw_data_mb * overhead_factor
                    
                    return {
                        "vector_count": vector_count,
                        "vector_dim": vector_dim,
                        "index_type": index_type,
                        "raw_data_mb": raw_data_mb,
                        "overhead_factor": overhead_factor,
                        "total_memory_mb": total_memory_mb,
                        "total_memory_gb": total_memory_mb / 1024
                    }
                
                def print_memory_report(self, estimates):
                    """打印内存报告"""
                    print("\n内存使用估算:")
                    print(f"  向量数量: {estimates['vector_count']:,}")
                    print(f"  向量维度: {estimates['vector_dim']}")
                    print(f"  索引类型: {estimates['index_type']}")
                    print(f"  原始数据: {estimates['raw_data_mb']:.2f} MB ({estimates['raw_data_mb']/1024:.2f} GB)")
                    print(f"  开销系数: {estimates['overhead_factor']:.1f}x")
                    print(f"  总内存: {estimates['total_memory_mb']:.2f} MB ({estimates['total_memory_gb']:.2f} GB)")
                
                def compare_index_memory(self, vector_count, vector_dim):
                    """对比不同索引的内存占用"""
                    index_types = ["FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW"]
                    
                    print(f"\n索引内存对比 ({vector_count:,}个{vector_dim}维向量):\n")
                    print(f"{'索引类型':>12s} {'原始数据':>12s} {'总内存':>12s} {'压缩率':>10s}")
                    print("-" * 50)
                    
                    for index_type in index_types:
                        est = self.estimate_memory_usage(vector_count, vector_dim, index_type)
                        compression = est['total_memory_mb'] / est['raw_data_mb']
                        
                        print(f"{index_type:>12s} {est['raw_data_mb']:11.2f}MB {est['total_memory_mb']:11.2f}MB {compression:9.1f}x")
            
            # 使用内存分析器
            analyzer = MemoryAnalyzer(collection)
            
            # 估算不同规模的内存需求
            scenarios = [
                (100000, 128, "IVF_FLAT"),
                (1000000, 128, "IVF_FLAT"),
                (10000000, 128, "IVF_SQ8"),
                (100000000, 128, "IVF_PQ")
            ]
            
            for vector_count, vector_dim, index_type in scenarios:
                estimates = analyzer.estimate_memory_usage(vector_count, vector_dim, index_type)
                analyzer.print_memory_report(estimates)
            
            # 对比索引内存
            analyzer.compare_index_memory(10000000, 128)
            
            print("\n内存优化建议:")
            print("  1. 使用量化索引(SQ8/PQ)降低内存")
            print("  2. 分区管理，按需加载")
            print("  3. 监控内存使用，设置限制")
            print("  4. 定期释放不用的分区")
            print("  5. 使用DiskANN处理超大规模数据")
            ---
    b.内存限制配置
        a.功能说明
            配置内存限制可以避免OOM，保证系统稳定性。可以为QueryNode设置内存上限。超过限制时拒绝加载新数据或查询。需要合理设置限制，避免过于严格影响性能。监控内存使用率，及时调整配置。可以配置内存预留，避免突发流量。实现内存告警机制，提前发现问题。
        b.代码示例
            ---
            # 内存限制配置（通过配置文件）
            memory_config = """
            queryNode:
              cache:
                memoryLimit: 2147483648  # 2GB内存限制
                enabled: true
              
              loadMemoryUsageMaxLevel: 90  # 内存使用率超过90%时停止加载
              
              gracefulStopTimeout: 30  # 优雅停机超时时间
            
            # 监控配置
            monitoring:
              memory:
                warningThreshold: 0.8  # 80%告警
                criticalThreshold: 0.9  # 90%严重告警
            """
            
            print("内存限制配置示例:")
            print(memory_config)
            
            print("\n内存限制策略:")
            print("  1. 设置QueryNode内存上限")
            print("  2. 配置内存使用率阈值")
            print("  3. 实现内存告警机制")
            print("  4. 优雅降级，拒绝新请求")
            print("  5. 定期清理缓存和临时数据")
            ---

02.量化压缩
    a.标量量化SQ8
        a.功能说明
            标量量化将float32向量压缩为int8，内存降低75%。SQ8使用线性量化，精度损失较小。适合内存受限但对精度要求不高的场景。查询速度略快于FLAT，因为数据量更小。召回率略低于FLAT，通常在95%以上。需要在构建索引时指定。不可逆压缩，无法恢复原始数据。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF_SQ8索引
            sq8_index = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            print("创建IVF_SQ8索引（标量量化）...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=sq8_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试查询性能
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            for _ in range(100):
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            query_time = time.time() - start
            
            print(f"\n100次查询总时间: {query_time:.2f}s")
            print(f"平均查询延迟: {query_time/100*1000:.2f}ms")
            print(f"QPS: {100/query_time:.2f}")
            
            print("\nSQ8量化特点:")
            print("  压缩率: 4x (float32 -> int8)")
            print("  内存节省: 75%")
            print("  召回率: ~95%")
            print("  查询速度: 略快于FLAT")
            print("  适用: 内存受限，可接受小幅精度损失")
            ---
    b.乘积量化PQ
        a.功能说明
            乘积量化将向量分段量化，压缩率更高。可以将内存降低到原来的10%甚至更低。通过m参数控制分段数，影响压缩率和精度。适合超大规模数据，内存严重受限的场景。召回率低于SQ8，通常在85-90%。查询速度较快，但精度损失较大。需要权衡内存和精度。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF_PQ索引
            pq_index = {
                "index_type": "IVF_PQ",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024,
                    "m": 16,        # 分段数，必须能整除向量维度
                    "nbits": 8      # 每段的比特数
                }
            }
            
            print("创建IVF_PQ索引（乘积量化）...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=pq_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试查询性能
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            for _ in range(100):
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            query_time = time.time() - start
            
            print(f"\n100次查询总时间: {query_time:.2f}s")
            print(f"平均查询延迟: {query_time/100*1000:.2f}ms")
            print(f"QPS: {100/query_time:.2f}")
            
            print("\nPQ量化特点:")
            print("  压缩率: 10-40x (取决于m和nbits)")
            print("  内存节省: 90-97%")
            print("  召回率: ~85-90%")
            print("  查询速度: 较快")
            print("  适用: 超大规模数据，内存严重受限")
            print("  参数: m必须能整除向量维度")
            ---

8.4 并发控制

01.连接池管理
    a.连接池配置
        a.功能说明
            连接池复用连接，减少连接建立开销。配置合适的连接池大小可以提升并发性能。连接池过小会导致连接等待，过大会浪费资源。推荐连接池大小为并发数的1-2倍。需要配置连接超时和空闲超时。实现连接健康检查，自动重连。监控连接池使用情况，动态调整。
        b.代码示例
            ---
            from pymilvus import connections, Collection
            import threading
            import time
            
            # 连接池配置
            class ConnectionPool:
                def __init__(self, alias_prefix="conn", pool_size=10):
                    self.alias_prefix = alias_prefix
                    self.pool_size = pool_size
                    self.connections = []
                    self.lock = threading.Lock()
                    self.init_pool()
                
                def init_pool(self):
                    """初始化连接池"""
                    print(f"初始化连接池，大小: {self.pool_size}")
                    
                    for i in range(self.pool_size):
                        alias = f"{self.alias_prefix}_{i}"
                        
                        connections.connect(
                            alias=alias,
                            host="localhost",
                            port="19530",
                            timeout=30
                        )
                        
                        self.connections.append({
                            "alias": alias,
                            "in_use": False,
                            "last_used": time.time()
                        })
                    
                    print(f"连接池初始化完成")
                
                def acquire(self, timeout=10):
                    """获取连接"""
                    start = time.time()
                    
                    while time.time() - start < timeout:
                        with self.lock:
                            for conn in self.connections:
                                if not conn["in_use"]:
                                    conn["in_use"] = True
                                    conn["last_used"] = time.time()
                                    return conn["alias"]
                        
                        time.sleep(0.01)
                    
                    raise TimeoutError("获取连接超时")
                
                def release(self, alias):
                    """释放连接"""
                    with self.lock:
                        for conn in self.connections:
                            if conn["alias"] == alias:
                                conn["in_use"] = False
                                conn["last_used"] = time.time()
                                break
                
                def get_stats(self):
                    """获取连接池统计"""
                    with self.lock:
                        total = len(self.connections)
                        in_use = sum(1 for conn in self.connections if conn["in_use"])
                        available = total - in_use
                        
                        return {
                            "total": total,
                            "in_use": in_use,
                            "available": available,
                            "usage_rate": in_use / total if total > 0 else 0
                        }
                
                def close_all(self):
                    """关闭所有连接"""
                    print("关闭连接池...")
                    
                    for conn in self.connections:
                        try:
                            connections.disconnect(conn["alias"])
                        except:
                            pass
                    
                    self.connections.clear()
                    print("连接池已关闭")
            
            # 使用连接池
            pool = ConnectionPool(pool_size=5)
            
            def worker_task(task_id, pool):
                """工作线程任务"""
                try:
                    # 获取连接
                    alias = pool.acquire(timeout=5)
                    print(f"任务{task_id}: 获取连接 {alias}")
                    
                    # 使用连接执行查询
                    collection = Collection("documents", using=alias)
                    
                    # 模拟查询
                    time.sleep(0.1)
                    
                    print(f"任务{task_id}: 完成查询")
                    
                    # 释放连接
                    pool.release(alias)
                    print(f"任务{task_id}: 释放连接 {alias}")
                
                except Exception as e:
                    print(f"任务{task_id}: 失败 - {e}")
            
            # 创建多个工作线程
            threads = []
            for i in range(10):
                thread = threading.Thread(target=worker_task, args=(i, pool))
                threads.append(thread)
                thread.start()
            
            # 等待所有线程完成
            for thread in threads:
                thread.join()
            
            # 打印连接池统计
            stats = pool.get_stats()
            print(f"\n连接池统计:")
            print(f"  总连接数: {stats['total']}")
            print(f"  使用中: {stats['in_use']}")
            print(f"  可用: {stats['available']}")
            print(f"  使用率: {stats['usage_rate']*100:.1f}%")
            
            # 关闭连接池
            pool.close_all()
            
            print("\n连接池配置建议:")
            print("  1. 连接池大小 = 并发数 * 1-2")
            print("  2. 配置连接超时和空闲超时")
            print("  3. 实现连接健康检查")
            print("  4. 监控连接池使用率")
            print("  5. 动态调整连接池大小")
            ---
    b.请求限流
        a.功能说明
            请求限流保护系统不被过载，保证服务稳定性。可以限制QPS、并发数、请求大小等。常见限流算法包括令牌桶、漏桶、固定窗口等。需要根据系统容量设置限流阈值。超过限流时返回错误或排队等待。可以为不同用户设置不同限流策略。实现优雅降级，保证核心功能可用。
        b.代码示例
            ---
            import time
            import threading
            from collections import deque
            
            # 令牌桶限流器
            class TokenBucketLimiter:
                def __init__(self, rate, capacity):
                    """
                    rate: 每秒生成的令牌数
                    capacity: 桶容量
                    """
                    self.rate = rate
                    self.capacity = capacity
                    self.tokens = capacity
                    self.last_update = time.time()
                    self.lock = threading.Lock()
                
                def acquire(self, tokens=1):
                    """获取令牌"""
                    with self.lock:
                        now = time.time()
                        
                        # 补充令牌
                        elapsed = now - self.last_update
                        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
                        self.last_update = now
                        
                        # 尝试获取令牌
                        if self.tokens >= tokens:
                            self.tokens -= tokens
                            return True
                        else:
                            return False
                
                def wait_acquire(self, tokens=1, timeout=10):
                    """等待获取令牌"""
                    start = time.time()
                    
                    while time.time() - start < timeout:
                        if self.acquire(tokens):
                            return True
                        time.sleep(0.01)
                    
                    return False
            
            # 滑动窗口限流器
            class SlidingWindowLimiter:
                def __init__(self, max_requests, window_seconds):
                    """
                    max_requests: 窗口内最大请求数
                    window_seconds: 窗口大小（秒）
                    """
                    self.max_requests = max_requests
                    self.window_seconds = window_seconds
                    self.requests = deque()
                    self.lock = threading.Lock()
                
                def acquire(self):
                    """尝试获取许可"""
                    with self.lock:
                        now = time.time()
                        
                        # 移除过期请求
                        while self.requests and self.requests[0] < now - self.window_seconds:
                            self.requests.popleft()
                        
                        # 检查是否超过限制
                        if len(self.requests) < self.max_requests:
                            self.requests.append(now)
                            return True
                        else:
                            return False
                
                def get_current_rate(self):
                    """获取当前请求率"""
                    with self.lock:
                        now = time.time()
                        
                        # 移除过期请求
                        while self.requests and self.requests[0] < now - self.window_seconds:
                            self.requests.popleft()
                        
                        return len(self.requests) / self.window_seconds
            
            # 并发限流器
            class ConcurrencyLimiter:
                def __init__(self, max_concurrent):
                    """
                    max_concurrent: 最大并发数
                    """
                    self.max_concurrent = max_concurrent
                    self.current = 0
                    self.lock = threading.Lock()
                
                def acquire(self):
                    """获取并发许可"""
                    with self.lock:
                        if self.current < self.max_concurrent:
                            self.current += 1
                            return True
                        else:
                            return False
                
                def release(self):
                    """释放并发许可"""
                    with self.lock:
                        if self.current > 0:
                            self.current -= 1
                
                def get_current(self):
                    """获取当前并发数"""
                    with self.lock:
                        return self.current
            
            # 测试限流器
            print("测试令牌桶限流器:")
            token_limiter = TokenBucketLimiter(rate=10, capacity=20)
            
            success_count = 0
            for i in range(50):
                if token_limiter.acquire():
                    success_count += 1
            
            print(f"  尝试50次请求，成功{success_count}次")
            
            print("\n测试滑动窗口限流器:")
            window_limiter = SlidingWindowLimiter(max_requests=100, window_seconds=1)
            
            success_count = 0
            for i in range(150):
                if window_limiter.acquire():
                    success_count += 1
            
            print(f"  尝试150次请求，成功{success_count}次")
            print(f"  当前请求率: {window_limiter.get_current_rate():.2f} qps")
            
            print("\n测试并发限流器:")
            concurrency_limiter = ConcurrencyLimiter(max_concurrent=10)
            
            acquired = 0
            for i in range(20):
                if concurrency_limiter.acquire():
                    acquired += 1
            
            print(f"  尝试获取20个并发，成功{acquired}个")
            print(f"  当前并发数: {concurrency_limiter.get_current()}")
            
            print("\n限流策略建议:")
            print("  1. 令牌桶: 允许突发流量，平滑限流")
            print("  2. 滑动窗口: 精确控制时间窗口内请求数")
            print("  3. 并发限流: 控制同时执行的请求数")
            print("  4. 组合使用: QPS + 并发双重限流")
            print("  5. 分级限流: 不同用户不同限制")
            ---

02.资源隔离
    a.资源组配置
        a.功能说明
            资源组实现多租户资源隔离，避免相互影响。可以为不同业务分配独立的QueryNode资源。每个资源组有独立的内存和CPU配额。支持动态调整资源组配置。可以实现优先级调度，保证核心业务。适合多租户、多业务场景。需要合理规划资源分配。
        b.代码示例
            ---
            from pymilvus import utility
            
            # 资源组管理类
            class ResourceGroupManager:
                @staticmethod
                def create_resource_group(name, config=None):
                    """创建资源组"""
                    if config is None:
                        config = {
                            "requests": {"node_num": 1},
                            "limits": {"node_num": 2}
                        }
                    
                    try:
                        utility.create_resource_group(name, config=config)
                        print(f"创建资源组: {name}")
                        print(f"  配置: {config}")
                    except Exception as e:
                        print(f"创建资源组失败: {e}")
                
                @staticmethod
                def list_resource_groups():
                    """列出所有资源组"""
                    try:
                        groups = utility.list_resource_groups()
                        print("\n资源组列表:")
                        for group in groups:
                            print(f"  - {group}")
                        return groups
                    except Exception as e:
                        print(f"列出资源组失败: {e}")
                        return []
                
                @staticmethod
                def describe_resource_group(name):
                    """查看资源组详情"""
                    try:
                        info = utility.describe_resource_group(name)
                        print(f"\n资源组详情: {name}")
                        print(f"  {info}")
                        return info
                    except Exception as e:
                        print(f"查看资源组失败: {e}")
                        return None
                
                @staticmethod
                def transfer_node(source_group, target_group, num_nodes=1):
                    """在资源组间转移节点"""
                    try:
                        utility.transfer_node(source_group, target_group, num_nodes)
                        print(f"转移{num_nodes}个节点: {source_group} -> {target_group}")
                    except Exception as e:
                        print(f"转移节点失败: {e}")
                
                @staticmethod
                def drop_resource_group(name):
                    """删除资源组"""
                    try:
                        utility.drop_resource_group(name)
                        print(f"删除资源组: {name}")
                    except Exception as e:
                        print(f"删除资源组失败: {e}")
            
            # 使用资源组管理器
            manager = ResourceGroupManager()
            
            # 创建资源组
            print("创建资源组:")
            manager.create_resource_group("business_a", config={"requests": {"node_num": 2}})
            manager.create_resource_group("business_b", config={"requests": {"node_num": 1}})
            manager.create_resource_group("business_c", config={"requests": {"node_num": 1}})
            
            # 列出资源组
            groups = manager.list_resource_groups()
            
            # 查看资源组详情
            for group in groups:
                manager.describe_resource_group(group)
            
            # 资源组使用示例
            print("\n资源组使用场景:")
            print("  1. 多租户隔离: 每个租户独立资源组")
            print("  2. 业务隔离: 核心业务和非核心业务分离")
            print("  3. 环境隔离: 生产、测试、开发环境分离")
            print("  4. 优先级保证: 高优先级业务独享资源")
            print("  5. 资源弹性: 动态调整资源分配")
            ---
    b.查询优先级
        a.功能说明
            查询优先级确保重要查询优先执行。可以为不同查询设置优先级级别。高优先级查询优先获取资源和执行。低优先级查询在资源紧张时可能被延迟或拒绝。适合多业务场景，保证核心业务SLA。需要合理设置优先级策略。实现优先级队列和调度算法。监控不同优先级的查询性能。
        b.代码示例
            ---
            import time
            import threading
            from queue import PriorityQueue
            from pymilvus import Collection
            import numpy as np
            
            # 优先级查询管理器
            class PriorityQueryManager:
                def __init__(self, collection, max_workers=4):
                    self.collection = collection
                    self.max_workers = max_workers
                    self.query_queue = PriorityQueue()
                    self.workers = []
                    self.running = False
                    self.stats = {
                        "high": {"count": 0, "total_latency": 0},
                        "medium": {"count": 0, "total_latency": 0},
                        "low": {"count": 0, "total_latency": 0}
                    }
                    self.stats_lock = threading.Lock()
                
                def start(self):
                    """启动工作线程"""
                    self.running = True
                    
                    for i in range(self.max_workers):
                        worker = threading.Thread(target=self._worker, args=(i,))
                        worker.daemon = True
                        worker.start()
                        self.workers.append(worker)
                    
                    print(f"启动{self.max_workers}个工作线程")
                
                def stop(self):
                    """停止工作线程"""
                    self.running = False
                    
                    for worker in self.workers:
                        worker.join()
                    
                    print("所有工作线程已停止")
                
                def _worker(self, worker_id):
                    """工作线程"""
                    while self.running:
                        try:
                            # 获取查询任务（优先级高的先执行）
                            priority, query_id, query_vector, priority_name = self.query_queue.get(timeout=0.1)
                            
                            # 执行查询
                            start = time.time()
                            
                            search_params = {
                                "metric_type": "L2",
                                "params": {"nprobe": 16}
                            }
                            
                            results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                            
                            latency = time.time() - start
                            
                            # 更新统计
                            with self.stats_lock:
                                self.stats[priority_name]["count"] += 1
                                self.stats[priority_name]["total_latency"] += latency
                            
                            print(f"工作线程{worker_id}: 完成查询{query_id} (优先级:{priority_name}, 延迟:{latency*1000:.2f}ms)")
                            
                            self.query_queue.task_done()
                        
                        except:
                            pass
                
                def submit_query(self, query_vector, priority="medium"):
                    """提交查询"""
                    # 优先级映射（数字越小优先级越高）
                    priority_map = {
                        "high": 0,
                        "medium": 1,
                        "low": 2
                    }
                    
                    priority_value = priority_map.get(priority, 1)
                    query_id = f"{priority}_{int(time.time()*1000000)}"
                    
                    self.query_queue.put((priority_value, query_id, query_vector, priority))
                    
                    return query_id
                
                def get_stats(self):
                    """获取统计信息"""
                    with self.stats_lock:
                        stats_copy = {}
                        
                        for priority, data in self.stats.items():
                            if data["count"] > 0:
                                avg_latency = data["total_latency"] / data["count"]
                            else:
                                avg_latency = 0
                            
                            stats_copy[priority] = {
                                "count": data["count"],
                                "avg_latency": avg_latency * 1000  # ms
                            }
                        
                        return stats_copy
            
            # 使用优先级查询管理器
            collection = Collection("documents")
            collection.load()
            
            manager = PriorityQueryManager(collection, max_workers=4)
            manager.start()
            
            # 提交不同优先级的查询
            print("\n提交查询:")
            
            for i in range(10):
                query_vector = [np.random.random() for _ in range(128)]
                
                if i < 3:
                    priority = "high"
                elif i < 7:
                    priority = "medium"
                else:
                    priority = "low"
                
                query_id = manager.submit_query(query_vector, priority=priority)
                print(f"  提交查询{query_id} (优先级:{priority})")
                
                time.sleep(0.1)
            
            # 等待所有查询完成
            manager.query_queue.join()
            
            # 打印统计
            stats = manager.get_stats()
            
            print(f"\n查询统计:")
            print(f"{'优先级':>10s} {'数量':>8s} {'平均延迟':>12s}")
            print("-" * 35)
            
            for priority in ["high", "medium", "low"]:
                if priority in stats:
                    print(f"{priority:>10s} {stats[priority]['count']:8d} {stats[priority]['avg_latency']:11.2f}ms")
            
            # 停止管理器
            manager.stop()
            
            print("\n优先级策略建议:")
            print("  1. 核心业务: 高优先级")
            print("  2. 常规业务: 中优先级")
            print("  3. 批量任务: 低优先级")
            print("  4. 监控不同优先级的性能")
            print("  5. 动态调整优先级策略")
            ---

8.5 缓存策略

01.查询缓存
    a.缓存机制
        a.功能说明
            查询缓存存储热点查询结果，减少重复计算。相同查询向量可以直接返回缓存结果。缓存命中可以显著降低查询延迟。适合查询模式重复的场景，如推荐系统。需要配置缓存大小和过期策略。缓存会占用额外内存。需要权衡缓存收益和内存开销。实现缓存预热和失效机制。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import hashlib
            import json
            
            collection = Collection("documents")
            collection.load()
            
            # 查询缓存类
            class QueryCache:
                def __init__(self, max_size=1000, ttl=300):
                    """
                    max_size: 最大缓存条目数
                    ttl: 缓存过期时间（秒）
                    """
                    self.max_size = max_size
                    self.ttl = ttl
                    self.cache = {}
                    self.access_count = {}
                    self.hit_count = 0
                    self.miss_count = 0
                
                def _generate_key(self, query_vector, search_params, limit):
                    """生成缓存键"""
                    # 将查询参数序列化为字符串
                    key_data = {
                        "vector": [round(v, 6) for v in query_vector],  # 保留6位小数
                        "params": search_params,
                        "limit": limit
                    }
                    
                    key_str = json.dumps(key_data, sort_keys=True)
                    key_hash = hashlib.md5(key_str.encode()).hexdigest()
                    
                    return key_hash
                
                def get(self, query_vector, search_params, limit):
                    """从缓存获取结果"""
                    key = self._generate_key(query_vector, search_params, limit)
                    
                    if key in self.cache:
                        entry = self.cache[key]
                        
                        # 检查是否过期
                        if time.time() - entry["timestamp"] < self.ttl:
                            self.hit_count += 1
                            self.access_count[key] = self.access_count.get(key, 0) + 1
                            return entry["results"]
                        else:
                            # 过期，删除缓存
                            del self.cache[key]
                            if key in self.access_count:
                                del self.access_count[key]
                    
                    self.miss_count += 1
                    return None
                
                def put(self, query_vector, search_params, limit, results):
                    """将结果放入缓存"""
                    key = self._generate_key(query_vector, search_params, limit)
                    
                    # 检查缓存大小
                    if len(self.cache) >= self.max_size:
                        # LRU淘汰：删除访问次数最少的
                        if self.access_count:
                            lru_key = min(self.access_count, key=self.access_count.get)
                            del self.cache[lru_key]
                            del self.access_count[lru_key]
                    
                    self.cache[key] = {
                        "results": results,
                        "timestamp": time.time()
                    }
                    
                    self.access_count[key] = 0
                
                def get_stats(self):
                    """获取缓存统计"""
                    total_requests = self.hit_count + self.miss_count
                    hit_rate = self.hit_count / total_requests if total_requests > 0 else 0
                    
                    return {
                        "cache_size": len(self.cache),
                        "max_size": self.max_size,
                        "hit_count": self.hit_count,
                        "miss_count": self.miss_count,
                        "hit_rate": hit_rate,
                        "total_requests": total_requests
                    }
                
                def clear(self):
                    """清空缓存"""
                    self.cache.clear()
                    self.access_count.clear()
                    self.hit_count = 0
                    self.miss_count = 0
            
            # 带缓存的查询类
            class CachedSearch:
                def __init__(self, collection, cache):
                    self.collection = collection
                    self.cache = cache
                
                def search(self, query_vector, search_params, limit=10):
                    """带缓存的查询"""
                    # 尝试从缓存获取
                    cached_results = self.cache.get(query_vector, search_params, limit)
                    
                    if cached_results is not None:
                        return cached_results, True  # 缓存命中
                    
                    # 缓存未命中，执行实际查询
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit
                    )
                    
                    # 将结果放入缓存
                    self.cache.put(query_vector, search_params, limit, results[0])
                    
                    return results[0], False  # 缓存未命中
            
            # 使用查询缓存
            cache = QueryCache(max_size=100, ttl=60)
            cached_search = CachedSearch(collection, cache)
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 生成一些查询向量
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(10)]
            
            print("测试查询缓存:\n")
            
            # 第一轮查询（缓存未命中）
            print("第一轮查询（缓存未命中）:")
            for i, query_vector in enumerate(query_vectors):
                start = time.time()
                results, hit = cached_search.search(query_vector, search_params)
                latency = time.time() - start
                
                print(f"  查询{i+1}: {'命中' if hit else '未命中'}, 延迟: {latency*1000:.2f}ms")
            
            # 第二轮查询（缓存命中）
            print("\n第二轮查询（缓存命中）:")
            for i, query_vector in enumerate(query_vectors):
                start = time.time()
                results, hit = cached_search.search(query_vector, search_params)
                latency = time.time() - start
                
                print(f"  查询{i+1}: {'命中' if hit else '未命中'}, 延迟: {latency*1000:.2f}ms")
            
            # 打印缓存统计
            stats = cache.get_stats()
            
            print(f"\n缓存统计:")
            print(f"  缓存大小: {stats['cache_size']}/{stats['max_size']}")
            print(f"  命中次数: {stats['hit_count']}")
            print(f"  未命中次数: {stats['miss_count']}")
            print(f"  命中率: {stats['hit_rate']*100:.1f}%")
            print(f"  总请求数: {stats['total_requests']}")
            
            print("\n查询缓存建议:")
            print("  1. 适合查询模式重复的场景")
            print("  2. 配置合适的缓存大小和TTL")
            print("  3. 使用LRU等淘汰策略")
            print("  4. 监控缓存命中率")
            print("  5. 数据更新时及时失效缓存")
            ---
    b.缓存预热
        a.功能说明
            缓存预热在系统启动时预先加载热点数据。避免冷启动时大量缓存未命中。可以根据历史查询日志识别热点查询。预热可以显著提升初期性能。需要平衡预热时间和收益。可以异步预热，不阻塞服务启动。实现增量预热，逐步加载数据。监控预热效果，优化预热策略。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import json
            
            collection = Collection("documents")
            collection.load()
            
            # 缓存预热类
            class CacheWarmer:
                def __init__(self, cached_search):
                    self.cached_search = cached_search
                
                def warm_from_queries(self, query_list):
                    """从查询列表预热缓存"""
                    print(f"\n开始缓存预热，共{len(query_list)}个查询...")
                    
                    start = time.time()
                    
                    for i, query_info in enumerate(query_list):
                        query_vector = query_info["vector"]
                        search_params = query_info["params"]
                        limit = query_info.get("limit", 10)
                        
                        # 执行查询，填充缓存
                        self.cached_search.search(query_vector, search_params, limit)
                        
                        if (i + 1) % 10 == 0:
                            print(f"  已预热 {i+1}/{len(query_list)} 个查询")
                    
                    elapsed = time.time() - start
                    
                    print(f"缓存预热完成，耗时: {elapsed:.2f}s")
                    
                    return elapsed
                
                def warm_from_log(self, log_file, top_n=100):
                    """从查询日志预热缓存"""
                    print(f"\n从查询日志预热缓存（Top {top_n}）...")
                    
                    # 读取查询日志
                    try:
                        with open(log_file, 'r') as f:
                            logs = json.load(f)
                        
                        # 统计查询频率
                        query_freq = {}
                        for log in logs:
                            query_key = json.dumps(log, sort_keys=True)
                            query_freq[query_key] = query_freq.get(query_key, 0) + 1
                        
                        # 选择Top N热点查询
                        top_queries = sorted(query_freq.items(), key=lambda x: x[1], reverse=True)[:top_n]
                        
                        # 预热
                        query_list = [json.loads(q[0]) for q in top_queries]
                        elapsed = self.warm_from_queries(query_list)
                        
                        print(f"预热了{len(query_list)}个热点查询")
                        
                        return elapsed
                    
                    except Exception as e:
                        print(f"从日志预热失败: {e}")
                        return 0
                
                def warm_async(self, query_list, callback=None):
                    """异步预热缓存"""
                    import threading
                    
                    def warm_task():
                        elapsed = self.warm_from_queries(query_list)
                        
                        if callback:
                            callback(elapsed)
                    
                    thread = threading.Thread(target=warm_task, daemon=True)
                    thread.start()
                    
                    print("异步预热已启动")
                    
                    return thread
            
            # 使用缓存预热
            cache = QueryCache(max_size=100, ttl=300)
            cached_search = CachedSearch(collection, cache)
            warmer = CacheWarmer(cached_search)
            
            # 准备预热查询列表
            warm_queries = []
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            for i in range(20):
                warm_queries.append({
                    "vector": [np.random.random() for _ in range(128)],
                    "params": search_params,
                    "limit": 10
                })
            
            # 同步预热
            warmer.warm_from_queries(warm_queries)
            
            # 验证预热效果
            stats = cache.get_stats()
            print(f"\n预热后缓存统计:")
            print(f"  缓存大小: {stats['cache_size']}")
            
            # 异步预热示例
            def on_warm_complete(elapsed):
                print(f"\n异步预热完成回调: 耗时{elapsed:.2f}s")
            
            warmer.warm_async(warm_queries[:10], callback=on_warm_complete)
            
            print("\n缓存预热建议:")
            print("  1. 启动时预热热点查询")
            print("  2. 从历史日志识别热点")
            print("  3. 异步预热，不阻塞启动")
            print("  4. 增量预热，逐步加载")
            print("  5. 监控预热效果，优化策略")
            ---

02.数据缓存
    a.Collection缓存
        a.功能说明
            Collection缓存将常用Collection保持在内存中。避免频繁加载释放Collection的开销。适合多Collection场景，优先缓存热点Collection。需要配置缓存大小，避免内存溢出。实现LRU淘汰策略，自动管理缓存。监控Collection访问频率，动态调整缓存。可以预加载预期会使用的Collection。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            from collections import OrderedDict
            
            # Collection缓存管理器
            class CollectionCacheManager:
                def __init__(self, max_cached=10):
                    """
                    max_cached: 最大缓存Collection数量
                    """
                    self.max_cached = max_cached
                    self.cache = OrderedDict()
                    self.access_count = {}
                    self.hit_count = 0
                    self.miss_count = 0
                
                def get_collection(self, collection_name):
                    """获取Collection（带缓存）"""
                    if collection_name in self.cache:
                        # 缓存命中
                        self.hit_count += 1
                        self.access_count[collection_name] = self.access_count.get(collection_name, 0) + 1
                        
                        # 移到最后（LRU）
                        self.cache.move_to_end(collection_name)
                        
                        return self.cache[collection_name]
                    else:
                        # 缓存未命中
                        self.miss_count += 1
                        
                        # 加载Collection
                        collection = Collection(collection_name)
                        
                        # 检查缓存大小
                        if len(self.cache) >= self.max_cached:
                            # 淘汰最久未使用的
                            evicted_name, evicted_collection = self.cache.popitem(last=False)
                            
                            # 释放Collection
                            try:
                                evicted_collection.release()
                                print(f"  淘汰Collection: {evicted_name}")
                            except:
                                pass
                        
                        # 加载并缓存
                        collection.load()
                        self.cache[collection_name] = collection
                        self.access_count[collection_name] = 1
                        
                        print(f"  加载Collection: {collection_name}")
                        
                        return collection
                
                def preload_collections(self, collection_names):
                    """预加载Collection"""
                    print(f"\n预加载{len(collection_names)}个Collection...")
                    
                    for name in collection_names:
                        self.get_collection(name)
                    
                    print("预加载完成")
                
                def get_stats(self):
                    """获取缓存统计"""
                    total_requests = self.hit_count + self.miss_count
                    hit_rate = self.hit_count / total_requests if total_requests > 0 else 0
                    
                    return {
                        "cached_collections": len(self.cache),
                        "max_cached": self.max_cached,
                        "hit_count": self.hit_count,
                        "miss_count": self.miss_count,
                        "hit_rate": hit_rate,
                        "access_count": self.access_count.copy()
                    }
                
                def clear(self):
                    """清空缓存"""
                    for collection in self.cache.values():
                        try:
                            collection.release()
                        except:
                            pass
                    
                    self.cache.clear()
                    self.access_count.clear()
                    print("缓存已清空")
            
            # 使用Collection缓存管理器
            cache_manager = CollectionCacheManager(max_cached=5)
            
            # 模拟访问多个Collection
            collection_names = ["coll_1", "coll_2", "coll_3", "coll_4", "coll_5", "coll_6"]
            
            print("测试Collection缓存:\n")
            
            # 第一轮访问
            print("第一轮访问:")
            for name in collection_names:
                try:
                    collection = cache_manager.get_collection(name)
                except:
                    print(f"  加载{name}失败（Collection可能不存在）")
            
            # 第二轮访问（部分命中）
            print("\n第二轮访问:")
            for name in collection_names[:3]:
                try:
                    collection = cache_manager.get_collection(name)
                except:
                    pass
            
            # 打印统计
            stats = cache_manager.get_stats()
            
            print(f"\n缓存统计:")
            print(f"  缓存Collection数: {stats['cached_collections']}/{stats['max_cached']}")
            print(f"  命中次数: {stats['hit_count']}")
            print(f"  未命中次数: {stats['miss_count']}")
            print(f"  命中率: {stats['hit_rate']*100:.1f}%")
            
            print(f"\n访问频率:")
            for name, count in sorted(stats['access_count'].items(), key=lambda x: x[1], reverse=True):
                print(f"  {name}: {count}次")
            
            # 清空缓存
            cache_manager.clear()
            
            print("\nCollection缓存建议:")
            print("  1. 缓存热点Collection")
            print("  2. 使用LRU淘汰策略")
            print("  3. 配置合适的缓存大小")
            print("  4. 预加载预期使用的Collection")
            print("  5. 监控访问频率，动态调整")
            ---
    b.结果缓存
        a.功能说明
            结果缓存存储查询结果，避免重复计算。适合查询结果较大的场景，如返回大量向量。可以缓存中间结果，如召回结果、排序结果等。需要考虑缓存一致性，数据更新时失效缓存。实现分层缓存，L1内存缓存+L2磁盘缓存。监控缓存命中率和内存使用。权衡缓存收益和维护成本。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import pickle
            import os
            
            # 分层结果缓存
            class TieredResultCache:
                def __init__(self, l1_max_size=100, l2_cache_dir="/tmp/milvus_cache"):
                    """
                    l1_max_size: L1内存缓存大小
                    l2_cache_dir: L2磁盘缓存目录
                    """
                    self.l1_max_size = l1_max_size
                    self.l2_cache_dir = l2_cache_dir
                    self.l1_cache = {}  # 内存缓存
                    self.l1_hit = 0
                    self.l2_hit = 0
                    self.miss = 0
                    
                    # 创建L2缓存目录
                    os.makedirs(l2_cache_dir, exist_ok=True)
                
                def _get_cache_path(self, key):
                    """获取L2缓存文件路径"""
                    return os.path.join(self.l2_cache_dir, f"{key}.pkl")
                
                def get(self, key):
                    """获取缓存结果"""
                    # L1缓存查找
                    if key in self.l1_cache:
                        self.l1_hit += 1
                        return self.l1_cache[key], "L1"
                    
                    # L2缓存查找
                    cache_path = self._get_cache_path(key)
                    if os.path.exists(cache_path):
                        try:
                            with open(cache_path, 'rb') as f:
                                results = pickle.load(f)
                            
                            self.l2_hit += 1
                            
                            # 提升到L1缓存
                            self._put_l1(key, results)
                            
                            return results, "L2"
                        except:
                            pass
                    
                    # 缓存未命中
                    self.miss += 1
                    return None, None
                
                def _put_l1(self, key, results):
                    """放入L1缓存"""
                    # 检查L1缓存大小
                    if len(self.l1_cache) >= self.l1_max_size:
                        # 淘汰一个（简单FIFO）
                        evicted_key = next(iter(self.l1_cache))
                        evicted_results = self.l1_cache.pop(evicted_key)
                        
                        # 写入L2缓存
                        self._put_l2(evicted_key, evicted_results)
                    
                    self.l1_cache[key] = results
                
                def _put_l2(self, key, results):
                    """放入L2缓存"""
                    cache_path = self._get_cache_path(key)
                    
                    try:
                        with open(cache_path, 'wb') as f:
                            pickle.dump(results, f)
                    except:
                        pass
                
                def put(self, key, results):
                    """放入缓存"""
                    self._put_l1(key, results)
                
                def get_stats(self):
                    """获取统计信息"""
                    total = self.l1_hit + self.l2_hit + self.miss
                    
                    return {
                        "l1_size": len(self.l1_cache),
                        "l1_hit": self.l1_hit,
                        "l2_hit": self.l2_hit,
                        "miss": self.miss,
                        "total": total,
                        "hit_rate": (self.l1_hit + self.l2_hit) / total if total > 0 else 0
                    }
                
                def clear(self):
                    """清空缓存"""
                    self.l1_cache.clear()
                    
                    # 清空L2缓存
                    for filename in os.listdir(self.l2_cache_dir):
                        filepath = os.path.join(self.l2_cache_dir, filename)
                        try:
                            os.remove(filepath)
                        except:
                            pass
            
            # 使用分层缓存
            tiered_cache = TieredResultCache(l1_max_size=5)
            
            print("测试分层结果缓存:\n")
            
            # 模拟查询和缓存
            for i in range(10):
                key = f"query_{i}"
                
                # 尝试从缓存获取
                results, source = tiered_cache.get(key)
                
                if results is None:
                    # 缓存未命中，生成结果
                    results = [np.random.random() for _ in range(100)]
                    tiered_cache.put(key, results)
                    print(f"  {key}: 未命中，生成结果")
                else:
                    print(f"  {key}: 命中（{source}缓存）")
            
            # 再次访问前几个查询
            print("\n再次访问:")
            for i in range(5):
                key = f"query_{i}"
                results, source = tiered_cache.get(key)
                print(f"  {key}: {'命中' if results else '未命中'}（{source}缓存）")
            
            # 打印统计
            stats = tiered_cache.get_stats()
            
            print(f"\n缓存统计:")
            print(f"  L1缓存大小: {stats['l1_size']}")
            print(f"  L1命中: {stats['l1_hit']}")
            print(f"  L2命中: {stats['l2_hit']}")
            print(f"  未命中: {stats['miss']}")
            print(f"  总命中率: {stats['hit_rate']*100:.1f}%")
            
            # 清空缓存
            tiered_cache.clear()
            
            print("\n结果缓存建议:")
            print("  1. 缓存大结果，避免重复计算")
            print("  2. 分层缓存，平衡速度和容量")
            print("  3. L1内存缓存热点，L2磁盘缓存冷数据")
            print("  4. 数据更新时及时失效缓存")
            print("  5. 监控缓存命中率和大小")
            ---

9 集群部署

9.1 分布式架构

01.架构组件
    a.组件角色
        a.功能说明
            Milvus采用存储计算分离的分布式架构。主要组件包括Coordinator（协调器）、Worker Node（工作节点）、存储层。Coordinator包括Root Coord、Data Coord、Query Coord、Index Coord。Worker Node包括Query Node、Data Node、Index Node。存储层使用MinIO/S3存储向量数据，etcd存储元数据，Pulsar/Kafka作为消息队列。各组件独立扩展，支持水平扩容。
        b.代码示例
            ---
            # Milvus分布式架构组件说明
            
            architecture = {
                "coordinators": {
                    "root_coord": {
                        "role": "全局协调器",
                        "responsibilities": [
                            "DDL操作（创建/删除Collection）",
                            "分配时间戳",
                            "管理数据通道"
                        ],
                        "count": 1  # 单实例
                    },
                    "data_coord": {
                        "role": "数据协调器",
                        "responsibilities": [
                            "管理数据分段",
                            "分配数据写入任务",
                            "触发数据持久化"
                        ],
                        "count": 1
                    },
                    "query_coord": {
                        "role": "查询协调器",
                        "responsibilities": [
                            "管理查询节点",
                            "分配查询任务",
                            "负载均衡"
                        ],
                        "count": 1
                    },
                    "index_coord": {
                        "role": "索引协调器",
                        "responsibilities": [
                            "管理索引构建",
                            "分配索引任务",
                            "监控索引进度"
                        ],
                        "count": 1
                    }
                },
                "workers": {
                    "query_node": {
                        "role": "查询节点",
                        "responsibilities": [
                            "执行向量检索",
                            "加载数据到内存",
                            "处理查询请求"
                        ],
                        "scalable": True,  # 可水平扩展
                        "recommended_count": "2-10"
                    },
                    "data_node": {
                        "role": "数据节点",
                        "responsibilities": [
                            "接收数据写入",
                            "数据持久化",
                            "数据合并"
                        ],
                        "scalable": True,
                        "recommended_count": "1-5"
                    },
                    "index_node": {
                        "role": "索引节点",
                        "responsibilities": [
                            "构建向量索引",
                            "索引优化",
                            "索引持久化"
                        ],
                        "scalable": True,
                        "recommended_count": "1-5"
                    }
                },
                "storage": {
                    "object_storage": {
                        "type": "MinIO/S3",
                        "stores": "向量数据、索引文件",
                        "required": True
                    },
                    "meta_storage": {
                        "type": "etcd",
                        "stores": "元数据、配置信息",
                        "required": True
                    },
                    "message_queue": {
                        "type": "Pulsar/Kafka",
                        "purpose": "数据流、事件通知",
                        "required": True
                    }
                }
            }
            
            print("Milvus分布式架构组件:\n")
            
            print("协调器组件:")
            for name, info in architecture["coordinators"].items():
                print(f"  {name}:")
                print(f"    角色: {info['role']}")
                print(f"    职责: {', '.join(info['responsibilities'])}")
                print(f"    实例数: {info['count']}")
            
            print("\n工作节点:")
            for name, info in architecture["workers"].items():
                print(f"  {name}:")
                print(f"    角色: {info['role']}")
                print(f"    职责: {', '.join(info['responsibilities'])}")
                print(f"    可扩展: {'是' if info['scalable'] else '否'}")
                print(f"    推荐数量: {info['recommended_count']}")
            
            print("\n存储层:")
            for name, info in architecture["storage"].items():
                print(f"  {name}:")
                print(f"    类型: {info['type']}")
                print(f"    存储内容: {info.get('stores', info.get('purpose'))}")
                print(f"    必需: {'是' if info['required'] else '否'}")
            
            print("\n架构特点:")
            print("  1. 存储计算分离，独立扩展")
            print("  2. 无状态Worker，易于水平扩展")
            print("  3. 协调器单点，通过主备保证高可用")
            print("  4. 统一存储层，支持多种存储后端")
            print("  5. 消息队列解耦，异步处理")
            ---
    b.数据流转
        a.功能说明
            数据在Milvus中经历写入、持久化、索引、查询等流程。写入数据首先进入消息队列，Data Node消费并持久化。持久化后触发索引构建，Index Node构建索引。查询时Query Node从存储层加载数据和索引。通过消息队列实现异步解耦。数据分段管理，支持增量更新。采用LSM-tree类似的设计，定期合并小段。
        b.代码示例
            ---
            # Milvus数据流转流程
            
            data_flow = {
                "write_path": [
                    {
                        "step": 1,
                        "component": "SDK/Client",
                        "action": "发送insert请求",
                        "data": "向量数据 + 标量字段"
                    },
                    {
                        "step": 2,
                        "component": "Proxy",
                        "action": "路由请求到Data Coord",
                        "data": "分配时间戳和数据通道"
                    },
                    {
                        "step": 3,
                        "component": "Data Coord",
                        "action": "分配数据段和Data Node",
                        "data": "segment分配信息"
                    },
                    {
                        "step": 4,
                        "component": "Message Queue",
                        "action": "写入消息队列",
                        "data": "数据消息"
                    },
                    {
                        "step": 5,
                        "component": "Data Node",
                        "action": "消费消息，缓存数据",
                        "data": "内存缓冲区"
                    },
                    {
                        "step": 6,
                        "component": "Data Node",
                        "action": "达到阈值后持久化",
                        "data": "写入对象存储（S3/MinIO）"
                    },
                    {
                        "step": 7,
                        "component": "Data Coord",
                        "action": "触发索引构建",
                        "data": "索引任务"
                    },
                    {
                        "step": 8,
                        "component": "Index Node",
                        "action": "构建索引并持久化",
                        "data": "索引文件写入对象存储"
                    }
                ],
                "query_path": [
                    {
                        "step": 1,
                        "component": "SDK/Client",
                        "action": "发送search请求",
                        "data": "查询向量 + 参数"
                    },
                    {
                        "step": 2,
                        "component": "Proxy",
                        "action": "路由到Query Coord",
                        "data": "查询请求"
                    },
                    {
                        "step": 3,
                        "component": "Query Coord",
                        "action": "分配Query Node",
                        "data": "负载均衡分配"
                    },
                    {
                        "step": 4,
                        "component": "Query Node",
                        "action": "检查数据是否已加载",
                        "data": "内存中的数据和索引"
                    },
                    {
                        "step": 5,
                        "component": "Query Node",
                        "action": "如未加载，从对象存储加载",
                        "data": "加载数据和索引到内存"
                    },
                    {
                        "step": 6,
                        "component": "Query Node",
                        "action": "执行向量检索",
                        "data": "使用索引进行ANN搜索"
                    },
                    {
                        "step": 7,
                        "component": "Query Node",
                        "action": "返回结果",
                        "data": "Top-K结果"
                    },
                    {
                        "step": 8,
                        "component": "Proxy",
                        "action": "合并多个Query Node结果",
                        "data": "全局Top-K结果"
                    }
                ]
            }
            
            print("Milvus数据流转:\n")
            
            print("写入路径:")
            for step_info in data_flow["write_path"]:
                print(f"  步骤{step_info['step']}: {step_info['component']}")
                print(f"    操作: {step_info['action']}")
                print(f"    数据: {step_info['data']}")
            
            print("\n查询路径:")
            for step_info in data_flow["query_path"]:
                print(f"  步骤{step_info['step']}: {step_info['component']}")
                print(f"    操作: {step_info['action']}")
                print(f"    数据: {step_info['data']}")
            
            print("\n关键特性:")
            print("  1. 异步写入: 通过消息队列解耦")
            print("  2. 批量持久化: 提升写入吞吐量")
            print("  3. 延迟索引: 数据先可查，后建索引")
            print("  4. 按需加载: Query Node按需加载数据")
            print("  5. 结果合并: Proxy合并分布式查询结果")
            ---

02.部署模式
    a.单机模式
        a.功能说明
            单机模式所有组件运行在一个进程中，适合开发测试。资源占用小，部署简单。不支持水平扩展和高可用。数据量和QPS受限于单机性能。适合原型验证、功能测试、小规模应用。生产环境建议使用分布式模式。可以通过Docker快速部署。
        b.代码示例
            ---
            # 单机模式部署（Docker）
            
            docker_standalone = """
            # 拉取Milvus镜像
            docker pull milvusdb/milvus:latest
            
            # 下载配置文件
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
            
            # 启动Milvus
            docker-compose up -d
            
            # 查看状态
            docker-compose ps
            
            # 查看日志
            docker-compose logs -f milvus-standalone
            
            # 停止服务
            docker-compose down
            """
            
            print("Milvus单机模式部署:\n")
            print(docker_standalone)
            
            # 单机模式配置示例
            standalone_config = {
                "deployment": {
                    "mode": "standalone",
                    "components": "all-in-one",
                    "process_count": 1
                },
                "resources": {
                    "cpu": "4 cores",
                    "memory": "8 GB",
                    "disk": "100 GB SSD"
                },
                "limitations": {
                    "max_vectors": "~10M",
                    "max_qps": "~1000",
                    "scalability": "不支持",
                    "high_availability": "不支持"
                },
                "use_cases": [
                    "开发测试",
                    "功能验证",
                    "小规模应用",
                    "原型开发"
                ]
            }
            
            print("\n单机模式特点:")
            print(f"  部署模式: {standalone_config['deployment']['mode']}")
            print(f"  组件: {standalone_config['deployment']['components']}")
            print(f"  进程数: {standalone_config['deployment']['process_count']}")
            
            print(f"\n资源需求:")
            print(f"  CPU: {standalone_config['resources']['cpu']}")
            print(f"  内存: {standalone_config['resources']['memory']}")
            print(f"  磁盘: {standalone_config['resources']['disk']}")
            
            print(f"\n限制:")
            print(f"  最大向量数: {standalone_config['limitations']['max_vectors']}")
            print(f"  最大QPS: {standalone_config['limitations']['max_qps']}")
            print(f"  可扩展性: {standalone_config['limitations']['scalability']}")
            print(f"  高可用: {standalone_config['limitations']['high_availability']}")
            
            print(f"\n适用场景:")
            for use_case in standalone_config['use_cases']:
                print(f"  - {use_case}")
            ---
    b.集群模式
        a.功能说明
            集群模式各组件独立部署，支持水平扩展。Coordinator和Worker分离，Worker可独立扩展。支持高可用配置，Coordinator主备切换。适合生产环境，支持大规模数据和高并发。需要部署etcd、MinIO/S3、Pulsar/Kafka等依赖。推荐使用Kubernetes部署和管理。可以根据负载动态扩缩容。
        b.代码示例
            ---
            # 集群模式架构配置
            
            cluster_config = {
                "deployment": {
                    "mode": "cluster",
                    "components": "分布式部署",
                    "coordinators": {
                        "root_coord": {"count": 1, "ha": "主备"},
                        "data_coord": {"count": 1, "ha": "主备"},
                        "query_coord": {"count": 1, "ha": "主备"},
                        "index_coord": {"count": 1, "ha": "主备"}
                    },
                    "workers": {
                        "query_node": {"count": "2-10", "scalable": True},
                        "data_node": {"count": "1-5", "scalable": True},
                        "index_node": {"count": "1-5", "scalable": True}
                    }
                },
                "dependencies": {
                    "etcd": {
                        "purpose": "元数据存储",
                        "ha": "3节点集群",
                        "required": True
                    },
                    "minio_s3": {
                        "purpose": "对象存储",
                        "ha": "分布式部署",
                        "required": True
                    },
                    "pulsar_kafka": {
                        "purpose": "消息队列",
                        "ha": "集群部署",
                        "required": True
                    }
                },
                "resources": {
                    "coordinator": {
                        "cpu": "2 cores",
                        "memory": "4 GB"
                    },
                    "query_node": {
                        "cpu": "8 cores",
                        "memory": "32 GB"
                    },
                    "data_node": {
                        "cpu": "4 cores",
                        "memory": "16 GB"
                    },
                    "index_node": {
                        "cpu": "8 cores",
                        "memory": "16 GB"
                    }
                },
                "capabilities": {
                    "max_vectors": "100M+",
                    "max_qps": "10000+",
                    "scalability": "水平扩展",
                    "high_availability": "支持"
                },
                "use_cases": [
                    "生产环境",
                    "大规模应用",
                    "高并发场景",
                    "企业级应用"
                ]
            }
            
            print("Milvus集群模式配置:\n")
            
            print("协调器部署:")
            for name, info in cluster_config["deployment"]["coordinators"].items():
                print(f"  {name}: {info['count']}个实例, 高可用: {info['ha']}")
            
            print("\n工作节点部署:")
            for name, info in cluster_config["deployment"]["workers"].items():
                scalable = "支持" if info['scalable'] else "不支持"
                print(f"  {name}: {info['count']}个实例, 水平扩展: {scalable}")
            
            print("\n依赖组件:")
            for name, info in cluster_config["dependencies"].items():
                print(f"  {name}:")
                print(f"    用途: {info['purpose']}")
                print(f"    高可用: {info['ha']}")
                print(f"    必需: {'是' if info['required'] else '否'}")
            
            print("\n资源配置:")
            for component, resources in cluster_config["resources"].items():
                print(f"  {component}:")
                print(f"    CPU: {resources['cpu']}")
                print(f"    内存: {resources['memory']}")
            
            print("\n能力:")
            print(f"  最大向量数: {cluster_config['capabilities']['max_vectors']}")
            print(f"  最大QPS: {cluster_config['capabilities']['max_qps']}")
            print(f"  可扩展性: {cluster_config['capabilities']['scalability']}")
            print(f"  高可用: {cluster_config['capabilities']['high_availability']}")
            
            print("\n适用场景:")
            for use_case in cluster_config['use_cases']:
                print(f"  - {use_case}")
            
            print("\n集群模式优势:")
            print("  1. 水平扩展: Worker节点按需扩展")
            print("  2. 高可用: Coordinator主备，Worker多副本")
            print("  3. 资源隔离: 不同组件独立资源")
            print("  4. 弹性伸缩: 根据负载动态调整")
            print("  5. 故障隔离: 单个节点故障不影响整体")
            ---

9.2 Docker Compose

01.Compose配置
    a.服务定义
        a.功能说明
            Docker Compose简化Milvus集群部署，通过YAML文件定义所有服务。包括etcd、MinIO、Pulsar等依赖组件。定义网络、卷、环境变量等配置。支持一键启动和停止整个集群。适合开发测试和小规模生产环境。可以方便地调整资源配置。实现服务编排和依赖管理。
        b.代码示例
            ---
            # docker-compose.yml完整示例
            
            version: '3.5'
            
            services:
              etcd:
                container_name: milvus-etcd
                image: quay.io/coreos/etcd:v3.5.5
                environment:
                  - ETCD_AUTO_COMPACTION_MODE=revision
                  - ETCD_AUTO_COMPACTION_RETENTION=1000
                  - ETCD_QUOTA_BACKEND_BYTES=4294967296
                  - ETCD_SNAPSHOT_COUNT=50000
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
                command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
                networks:
                  - milvus
            
              minio:
                container_name: milvus-minio
                image: minio/minio:RELEASE.2023-03-20T20-16-18Z
                environment:
                  MINIO_ACCESS_KEY: minioadmin
                  MINIO_SECRET_KEY: minioadmin
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
                command: minio server /minio_data --console-address ":9001"
                ports:
                  - "9000:9000"
                  - "9001:9001"
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
                  interval: 30s
                  timeout: 20s
                  retries: 3
                networks:
                  - milvus
            
              pulsar:
                container_name: milvus-pulsar
                image: apachepulsar/pulsar:2.8.2
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/pulsar:/pulsar/data
                environment:
                  - PULSAR_MEM=" -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
                command: |
                  bash -c "bin/apply-config-from-env.py conf/standalone.conf && bin/pulsar standalone"
                networks:
                  - milvus
            
              standalone:
                container_name: milvus-standalone
                image: milvusdb/milvus:v2.3.0
                command: ["milvus", "run", "standalone"]
                environment:
                  ETCD_ENDPOINTS: etcd:2379
                  MINIO_ADDRESS: minio:9000
                  PULSAR_ADDRESS: pulsar://pulsar:6650
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
                ports:
                  - "19530:19530"
                  - "9091:9091"
                depends_on:
                  - "etcd"
                  - "minio"
                  - "pulsar"
                networks:
                  - milvus
            
            networks:
              milvus:
                name: milvus
            
            volumes:
              etcd:
              minio:
              pulsar:
              milvus:
            
            # 使用说明：
            # 1. 启动所有服务：docker-compose up -d
            # 2. 查看服务状态：docker-compose ps
            # 3. 查看日志：docker-compose logs -f standalone
            # 4. 停止服务：docker-compose down
            # 5. 停止并删除数据：docker-compose down -v
            ---
    b.资源配置
        a.功能说明
            通过Compose配置各服务的资源限制。设置CPU和内存限制，避免资源竞争。配置健康检查，自动重启失败服务。定义依赖关系，确保启动顺序。可以配置副本数，实现简单的高可用。支持环境变量覆盖默认配置。实现配置文件和数据持久化。
        b.代码示例
            ---
            # 资源配置增强版docker-compose.yml
            
            version: '3.5'
            
            services:
              standalone:
                container_name: milvus-standalone
                image: milvusdb/milvus:v2.3.0
                command: ["milvus", "run", "standalone"]
                environment:
                  ETCD_ENDPOINTS: etcd:2379
                  MINIO_ADDRESS: minio:9000
                  PULSAR_ADDRESS: pulsar://pulsar:6650
                  # 性能调优参数
                  QUERY_NODE_GRACEFUL_STOP_TIMEOUT: 30
                  QUERY_NODE_SEARCH_TIMEOUT: 30
                  DATA_NODE_FLUSH_INSERT_BUFFER_SIZE: 16777216
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus/logs:/var/log/milvus
                ports:
                  - "19530:19530"
                  - "9091:9091"
                depends_on:
                  - "etcd"
                  - "minio"
                  - "pulsar"
                deploy:
                  resources:
                    limits:
                      cpus: '4.0'
                      memory: 8G
                    reservations:
                      cpus: '2.0'
                      memory: 4G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
                  interval: 30s
                  timeout: 10s
                  retries: 3
                  start_period: 40s
                logging:
                  driver: "json-file"
                  options:
                    max-size: "100m"
                    max-file: "3"
                networks:
                  - milvus
            
              minio:
                container_name: milvus-minio
                image: minio/minio:RELEASE.2023-03-20T20-16-18Z
                environment:
                  MINIO_ACCESS_KEY: minioadmin
                  MINIO_SECRET_KEY: minioadmin
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
                command: minio server /minio_data --console-address ":9001"
                ports:
                  - "9000:9000"
                  - "9001:9001"
                deploy:
                  resources:
                    limits:
                      cpus: '2.0'
                      memory: 4G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
                  interval: 30s
                  timeout: 20s
                  retries: 3
                networks:
                  - milvus
            
            networks:
              milvus:
                name: milvus
                driver: bridge
            
            # 资源配置说明：
            # - limits: 容器可使用的最大资源
            # - reservations: 容器保证获得的资源
            # - restart: always 自动重启
            # - healthcheck: 健康检查配置
            # - logging: 日志配置，限制日志大小
            ---

02.部署实践
    a.快速部署
        a.功能说明
            使用官方提供的docker-compose.yml快速部署Milvus。下载配置文件，一键启动所有服务。自动拉取所需镜像，创建网络和卷。适合快速体验和功能测试。默认配置可满足基本需求。可以根据需要调整配置参数。支持数据持久化，重启不丢失数据。
        b.代码示例
            ---
            #!/bin/bash
            # 快速部署Milvus脚本
            
            set -e
            
            echo "=========================================="
            echo "Milvus快速部署脚本"
            echo "=========================================="
            
            # 检查Docker和Docker Compose
            echo "检查环境..."
            if ! command -v docker &> /dev/null; then
                echo "错误: Docker未安装"
                exit 1
            fi
            
            if ! command -v docker-compose &> /dev/null; then
                echo "错误: Docker Compose未安装"
                exit 1
            fi
            
            # 下载docker-compose配置文件
            echo "下载docker-compose配置文件..."
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
            
            # 创建数据目录
            echo "创建数据目录..."
            mkdir -p volumes/etcd volumes/minio volumes/pulsar volumes/milvus
            
            # 启动Milvus
            echo "启动Milvus服务..."
            docker-compose up -d
            
            # 等待服务启动
            echo "等待服务启动（约30秒）..."
            sleep 30
            
            # 检查服务状态
            echo ""
            echo "服务状态："
            docker-compose ps
            
            # 检查Milvus健康状态
            echo ""
            echo "检查Milvus健康状态..."
            for i in {1..10}; do
                if curl -s http://localhost:9091/healthz | grep -q "OK"; then
                    echo "✓ Milvus服务健康"
                    break
                else
                    echo "等待Milvus就绪... ($i/10)"
                    sleep 5
                fi
            done
            
            echo ""
            echo "=========================================="
            echo "Milvus部署完成！"
            echo "=========================================="
            echo "连接信息："
            echo "  - Milvus地址: localhost:19530"
            echo "  - Milvus管理界面: http://localhost:9091"
            echo "  - MinIO控制台: http://localhost:9001"
            echo "    用户名: minioadmin"
            echo "    密码: minioadmin"
            echo ""
            echo "常用命令："
            echo "  - 查看日志: docker-compose logs -f standalone"
            echo "  - 停止服务: docker-compose down"
            echo "  - 重启服务: docker-compose restart"
            echo "=========================================="
            
            # 测试连接
            echo ""
            echo "测试连接..."
            python3 << 'PYTHON'
            from pymilvus import connections, utility
            import time
            
            max_retries = 5
            for i in range(max_retries):
                try:
                    connections.connect(host="localhost", port="19530")
                    print(f"✓ 连接成功！Milvus版本: {utility.get_server_version()}")
                    connections.disconnect("default")
                    break
                except Exception as e:
                    if i < max_retries - 1:
                        print(f"连接失败，重试... ({i+1}/{max_retries})")
                        time.sleep(5)
                    else:
                        print(f"✗ 连接失败: {e}")
            PYTHON
            ---
    b.生产部署
        a.功能说明
            生产环境需要更完善的配置和监控。配置资源限制和健康检查。实现日志收集和持久化。配置备份和恢复策略。使用外部存储，避免数据丢失。实现监控告警，及时发现问题。配置网络安全，限制访问权限。定期更新和维护。
        b.代码示例
            ---
            # 生产环境docker-compose.yml
            
            version: '3.5'
            
            services:
              etcd:
                container_name: milvus-etcd
                image: quay.io/coreos/etcd:v3.5.5
                environment:
                  - ETCD_AUTO_COMPACTION_MODE=revision
                  - ETCD_AUTO_COMPACTION_RETENTION=1000
                  - ETCD_QUOTA_BACKEND_BYTES=4294967296
                  - ETCD_SNAPSHOT_COUNT=50000
                  - ETCD_HEARTBEAT_INTERVAL=500
                  - ETCD_ELECTION_TIMEOUT=2500
                volumes:
                  - /data/milvus/etcd:/etcd
                command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
                deploy:
                  resources:
                    limits:
                      cpus: '2.0'
                      memory: 4G
                restart: always
                logging:
                  driver: "json-file"
                  options:
                    max-size: "200m"
                    max-file: "5"
                networks:
                  - milvus
            
              minio:
                container_name: milvus-minio
                image: minio/minio:RELEASE.2023-03-20T20-16-18Z
                environment:
                  MINIO_ACCESS_KEY: ${MINIO_ACCESS_KEY:-minioadmin}
                  MINIO_SECRET_KEY: ${MINIO_SECRET_KEY:-minioadmin}
                  MINIO_PROMETHEUS_AUTH_TYPE: public
                volumes:
                  - /data/milvus/minio:/minio_data
                command: minio server /minio_data --console-address ":9001"
                ports:
                  - "9000:9000"
                  - "9001:9001"
                deploy:
                  resources:
                    limits:
                      cpus: '4.0'
                      memory: 8G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
                  interval: 30s
                  timeout: 20s
                  retries: 3
                logging:
                  driver: "json-file"
                  options:
                    max-size: "200m"
                    max-file: "5"
                networks:
                  - milvus
            
              standalone:
                container_name: milvus-standalone
                image: milvusdb/milvus:v2.3.0
                command: ["milvus", "run", "standalone"]
                environment:
                  ETCD_ENDPOINTS: etcd:2379
                  MINIO_ADDRESS: minio:9000
                  MINIO_ACCESS_KEY_ID: ${MINIO_ACCESS_KEY:-minioadmin}
                  MINIO_SECRET_ACCESS_KEY: ${MINIO_SECRET_KEY:-minioadmin}
                  PULSAR_ADDRESS: pulsar://pulsar:6650
                  # 性能优化
                  QUERY_NODE_GRACEFUL_STOP_TIMEOUT: 30
                  QUERY_NODE_SEARCH_TIMEOUT: 30
                  DATA_NODE_FLUSH_INSERT_BUFFER_SIZE: 16777216
                  # 日志级别
                  LOG_LEVEL: info
                volumes:
                  - /data/milvus/data:/var/lib/milvus
                  - /data/milvus/logs:/var/log/milvus
                  - /data/milvus/config:/milvus/configs
                ports:
                  - "19530:19530"
                  - "9091:9091"
                depends_on:
                  - "etcd"
                  - "minio"
                  - "pulsar"
                deploy:
                  resources:
                    limits:
                      cpus: '8.0'
                      memory: 16G
                    reservations:
                      cpus: '4.0'
                      memory: 8G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
                  interval: 30s
                  timeout: 10s
                  retries: 3
                  start_period: 60s
                logging:
                  driver: "json-file"
                  options:
                    max-size: "200m"
                    max-file: "10"
                networks:
                  - milvus
            
            networks:
              milvus:
                name: milvus
                driver: bridge
            
            # 生产环境部署脚本
            # #!/bin/bash
            # 
            # # 设置环境变量
            # export MINIO_ACCESS_KEY="your-access-key"
            # export MINIO_SECRET_KEY="your-secret-key"
            # 
            # # 创建数据目录
            # mkdir -p /data/milvus/{etcd,minio,pulsar,data,logs,config}
            # 
            # # 设置权限
            # chmod 755 /data/milvus
            # 
            # # 启动服务
            # docker-compose up -d
            # 
            # # 配置备份定时任务
            # echo "0 2 * * * /opt/scripts/backup-milvus.sh" | crontab -
            # 
            # # 备份脚本示例
            # cat > /opt/scripts/backup-milvus.sh << 'EOF'
            # #!/bin/bash
            # BACKUP_DIR="/backup/milvus/$(date +%Y%m%d)"
            # mkdir -p $BACKUP_DIR
            # 
            # # 备份数据
            # tar -czf $BACKUP_DIR/milvus-data.tar.gz /data/milvus/data
            # tar -czf $BACKUP_DIR/milvus-etcd.tar.gz /data/milvus/etcd
            # tar -czf $BACKUP_DIR/milvus-minio.tar.gz /data/milvus/minio
            # 
            # # 保留最近7天的备份
            # find /backup/milvus -type d -mtime +7 -exec rm -rf {} \;
            # EOF
            # 
            # chmod +x /opt/scripts/backup-milvus.sh
            ---

9.3 Kubernetes部署

01.Helm部署
    a.Helm Chart
        a.功能说明
            使用Helm Chart简化Kubernetes部署。官方提供完整的Helm Chart，支持自定义配置。一键部署Milvus集群及所有依赖。支持滚动更新和回滚。可以方便地调整副本数和资源配置。实现配置管理和版本控制。适合生产环境大规模部署。
        b.代码示例
            ---
            # 使用Helm部署Milvus到Kubernetes
            
            # 1. 添加Milvus Helm仓库
            helm repo add milvus https://milvus-io.github.io/milvus-helm/
            helm repo update
            
            # 2. 查看可用版本
            helm search repo milvus
            
            # 3. 创建命名空间
            kubectl create namespace milvus
            
            # 4. 部署Milvus（默认配置）
            helm install milvus-release milvus/milvus --namespace milvus
            
            # 5. 自定义配置部署
            cat > values-custom.yaml <<EOF
            cluster:
              enabled: true
            
            image:
              all:
                repository: milvusdb/milvus
                tag: v2.3.0
                pullPolicy: IfNotPresent
            
            queryNode:
              replicas: 3
              resources:
                limits:
                  cpu: "4"
                  memory: "16Gi"
                requests:
                  cpu: "2"
                  memory: "8Gi"
            
            dataNode:
              replicas: 2
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
                requests:
                  cpu: "1"
                  memory: "4Gi"
            
            indexNode:
              replicas: 2
              resources:
                limits:
                  cpu: "4"
                  memory: "8Gi"
                requests:
                  cpu: "2"
                  memory: "4Gi"
            
            minio:
              mode: distributed
              replicas: 4
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            pulsar:
              enabled: true
              broker:
                replicaCount: 3
            
            etcd:
              replicaCount: 3
              resources:
                limits:
                  cpu: "1"
                  memory: "2Gi"
            
            service:
              type: LoadBalancer
              port: 19530
            EOF
            
            helm install milvus-release milvus/milvus -f values-custom.yaml --namespace milvus
            
            # 6. 查看部署状态
            kubectl get pods -n milvus
            kubectl get svc -n milvus
            
            # 7. 查看详细信息
            kubectl describe pod <pod-name> -n milvus
            
            # 8. 升级部署
            helm upgrade milvus-release milvus/milvus -f values-custom.yaml --namespace milvus
            
            # 9. 回滚到上一个版本
            helm rollback milvus-release --namespace milvus
            
            # 10. 查看发布历史
            helm history milvus-release --namespace milvus
            
            # 11. 卸载
            helm uninstall milvus-release --namespace milvus
            
            # 12. 删除命名空间
            kubectl delete namespace milvus
            ---
    b.配置优化
        a.功能说明
            根据业务需求优化Kubernetes配置。配置Pod资源请求和限制。设置节点亲和性和反亲和性。配置持久化卷，确保数据安全。实现自动扩缩容HPA。配置服务质量QoS。使用ConfigMap和Secret管理配置。实现滚动更新策略。
        b.代码示例
            ---
            # Kubernetes高级配置示例（values.yaml）
            
            # Query Node配置
            queryNode:
              replicas: 3
              resources:
                requests:
                  cpu: "2"
                  memory: "8Gi"
                limits:
                  cpu: "4"
                  memory: "16Gi"
              
              # Pod反亲和性：确保Pod分散在不同节点
              affinity:
                podAntiAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app.kubernetes.io/name
                        operator: In
                        values:
                        - milvus
                      - key: app.kubernetes.io/component
                        operator: In
                        values:
                        - querynode
                    topologyKey: kubernetes.io/hostname
              
              # 节点亲和性：优先调度到高性能节点
              nodeAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                    matchExpressions:
                    - key: node-type
                      operator: In
                      values:
                      - high-performance
              
              # 容忍度：允许调度到特定污点的节点
              tolerations:
              - key: "milvus"
                operator: "Equal"
                value: "querynode"
                effect: "NoSchedule"
              
              # 更新策略
              strategy:
                type: RollingUpdate
                rollingUpdate:
                  maxSurge: 1
                  maxUnavailable: 0
              
              # 健康检查
              livenessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 60
                periodSeconds: 30
                timeoutSeconds: 10
                failureThreshold: 3
              
              readinessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 30
                periodSeconds: 10
                timeoutSeconds: 5
                failureThreshold: 3
            
            # HPA自动扩缩容
            autoscaling:
              enabled: true
              minReplicas: 2
              maxReplicas: 10
              targetCPUUtilizationPercentage: 70
              targetMemoryUtilizationPercentage: 80
            
            # 持久化存储
            persistence:
              enabled: true
              storageClass: "fast-ssd"
              accessMode: ReadWriteOnce
              size: 500Gi
            
            # 监控配置
            metrics:
              enabled: true
              serviceMonitor:
                enabled: true
                interval: 30s
            
            # 日志配置
            log:
              level: info
              format: json
              persistence:
                enabled: true
                size: 100Gi
            
            # 安全配置
            securityContext:
              runAsNonRoot: true
              runAsUser: 1000
              fsGroup: 1000
            ---

02.运维管理
    a.滚动更新
        a.功能说明
            Kubernetes支持滚动更新，实现零停机升级。逐个替换Pod，保证服务可用性。可以配置更新策略，控制更新速度。支持健康检查，自动回滚失败更新。可以暂停和恢复更新过程。实现灰度发布和金丝雀部署。监控更新过程，及时发现问题。
        b.代码示例
            ---
            # 滚动更新操作指南
            
            # 1. 查看当前版本
            kubectl get deployment -n milvus
            kubectl describe deployment milvus-querynode -n milvus | grep Image
            
            # 2. 更新到新版本
            helm upgrade milvus-release milvus/milvus \\
              --set image.all.tag=v2.3.1 \\
              --namespace milvus
            
            # 3. 监控更新过程
            kubectl rollout status deployment/milvus-querynode -n milvus
            
            # 4. 查看更新历史
            kubectl rollout history deployment/milvus-querynode -n milvus
            
            # 5. 暂停更新
            kubectl rollout pause deployment/milvus-querynode -n milvus
            
            # 6. 恢复更新
            kubectl rollout resume deployment/milvus-querynode -n milvus
            
            # 7. 回滚到上一个版本
            kubectl rollout undo deployment/milvus-querynode -n milvus
            
            # 8. 回滚到指定版本
            kubectl rollout undo deployment/milvus-querynode -n milvus --to-revision=2
            
            # 9. 查看Pod状态
            kubectl get pods -n milvus -w
            
            # 10. 查看事件
            kubectl get events -n milvus --sort-by='.lastTimestamp'
            
            # 灰度发布示例（使用Istio）
            # 创建VirtualService实现流量分割
            cat <<EOF | kubectl apply -f -
            apiVersion: networking.istio.io/v1beta1
            kind: VirtualService
            metadata:
              name: milvus-canary
              namespace: milvus
            spec:
              hosts:
              - milvus
              http:
              - match:
                - headers:
                    canary:
                      exact: "true"
                route:
                - destination:
                    host: milvus
                    subset: v2
                  weight: 100
              - route:
                - destination:
                    host: milvus
                    subset: v1
                  weight: 90
                - destination:
                    host: milvus
                    subset: v2
                  weight: 10
            EOF
            ---
    b.故障恢复
        a.功能说明
            Kubernetes提供自动故障恢复能力。Pod失败自动重启，保证服务可用。节点故障自动迁移Pod到健康节点。通过健康检查及时发现问题。配置重启策略，避免频繁重启。实现多副本部署，提升可用性。监控集群状态，及时处理异常。
        b.代码示例
            ---
            # Kubernetes故障恢复操作指南
            
            # 1. 查看Pod状态
            kubectl get pods -n milvus
            kubectl get pods -n milvus -o wide
            
            # 2. 查看失败的Pod
            kubectl get pods -n milvus --field-selector=status.phase!=Running
            
            # 3. 查看Pod日志
            kubectl logs <pod-name> -n milvus
            kubectl logs <pod-name> -n milvus --previous  # 查看上一次运行的日志
            kubectl logs <pod-name> -n milvus --tail=100 -f  # 实时查看最后100行
            
            # 4. 查看Pod详细信息
            kubectl describe pod <pod-name> -n milvus
            
            # 5. 查看Pod事件
            kubectl get events -n milvus --field-selector involvedObject.name=<pod-name>
            
            # 6. 进入Pod调试
            kubectl exec -it <pod-name> -n milvus -- /bin/bash
            
            # 7. 强制删除Pod（触发重建）
            kubectl delete pod <pod-name> -n milvus --force --grace-period=0
            
            # 8. 重启Deployment
            kubectl rollout restart deployment/milvus-querynode -n milvus
            
            # 9. 查看节点状态
            kubectl get nodes
            kubectl describe node <node-name>
            
            # 10. 驱逐节点上的Pod（节点维护）
            kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
            
            # 11. 恢复节点
            kubectl uncordon <node-name>
            
            # 12. 查看资源使用情况
            kubectl top nodes
            kubectl top pods -n milvus
            
            # 故障排查脚本
            cat > troubleshoot.sh <<'EOF'
            #!/bin/bash
            
            NAMESPACE="milvus"
            
            echo "========== Pod状态 =========="
            kubectl get pods -n $NAMESPACE
            
            echo ""
            echo "========== 失败的Pod =========="
            kubectl get pods -n $NAMESPACE --field-selector=status.phase!=Running
            
            echo ""
            echo "========== 最近事件 =========="
            kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20
            
            echo ""
            echo "========== 资源使用 =========="
            kubectl top pods -n $NAMESPACE
            
            echo ""
            echo "========== 节点状态 =========="
            kubectl get nodes
            
            echo ""
            echo "========== PVC状态 =========="
            kubectl get pvc -n $NAMESPACE
            
            echo ""
            echo "========== Service状态 =========="
            kubectl get svc -n $NAMESPACE
            EOF
            
            chmod +x troubleshoot.sh
            ./troubleshoot.sh
            ---

9.4 高可用配置

01.组件高可用
    a.Coordinator高可用
        a.功能说明
            Coordinator采用主备模式实现高可用。通过etcd实现Leader选举。主节点故障时自动切换到备节点。切换时间通常在秒级。需要部署多个Coordinator实例。推荐部署3个实例，保证奇数。监控Leader状态，及时发现问题。实现自动故障转移。
        b.代码示例
            ---
            # Coordinator高可用配置（Kubernetes Helm values.yaml）
            
            rootCoord:
              replicas: 3  # 部署3个实例
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
                requests:
                  cpu: "1"
                  memory: "2Gi"
              affinity:
                podAntiAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchLabels:
                        component: rootcoord
                    topologyKey: kubernetes.io/hostname
            
            dataCoord:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            queryCoord:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            indexCoord:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            # etcd高可用配置
            etcd:
              replicaCount: 3  # 3节点集群
              resources:
                limits:
                  cpu: "1"
                  memory: "2Gi"
              persistence:
                enabled: true
                size: 10Gi
            
            # 监控Coordinator状态
            # kubectl get pods -n milvus | grep coord
            # kubectl logs -f <rootcoord-pod> -n milvus
            ---
    b.Worker高可用
        a.功能说明
            Worker节点通过多副本实现高可用。每个Worker类型部署多个实例。单个实例故障不影响整体服务。Query Coord自动分配任务到健康节点。支持动态扩缩容，根据负载调整。实现负载均衡，避免热点。监控Worker健康状态。自动剔除故障节点。
        b.代码示例
            ---
            # Worker高可用配置
            
            queryNode:
              replicas: 5  # 多副本部署
              resources:
                limits:
                  cpu: "4"
                  memory: "16Gi"
              # Pod反亲和性：分散到不同节点
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchLabels:
                          component: querynode
                      topologyKey: kubernetes.io/hostname
              # 健康检查
              livenessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 60
                periodSeconds: 30
                failureThreshold: 3
              readinessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 30
                periodSeconds: 10
                failureThreshold: 3
            
            dataNode:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
            
            indexNode:
              replicas: 3
              resources:
                limits:
                  cpu: "4"
                  memory: "8Gi"
            
            # 测试故障转移
            # 1. 删除一个QueryNode Pod
            # kubectl delete pod <querynode-pod> -n milvus
            # 
            # 2. 观察自动重建
            # kubectl get pods -n milvus -w
            # 
            # 3. 验证服务可用
            # python3 test_connection.py
            ---

02.数据高可用
    a.存储高可用
        a.功能说明
            使用分布式存储保证数据高可用。MinIO采用分布式模式，多副本存储。etcd使用3节点集群，Raft协议保证一致性。Pulsar支持多副本，保证消息不丢失。配置持久化卷，数据持久化存储。实现定期备份，防止数据丢失。监控存储健康状态。
        b.代码示例
            ---
            # 存储高可用配置
            
            # MinIO分布式模式
            minio:
              mode: distributed
              replicas: 4  # 4节点分布式部署
              drivesPerNode: 1
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
              persistence:
                enabled: true
                storageClass: "fast-ssd"
                size: 500Gi
              # 纠删码配置
              erasureCodingParity: 2  # 允许2个节点故障
            
            # etcd集群
            etcd:
              replicaCount: 3
              persistence:
                enabled: true
                storageClass: "fast-ssd"
                size: 10Gi
              resources:
                limits:
                  cpu: "1"
                  memory: "2Gi"
              # 快照备份
              autoCompactionMode: revision
              autoCompactionRetention: "1000"
            
            # Pulsar集群
            pulsar:
              enabled: true
              broker:
                replicaCount: 3
                resources:
                  limits:
                    cpu: "2"
                    memory: "4Gi"
              bookkeeper:
                replicaCount: 3
                persistence:
                  enabled: true
                  size: 100Gi
              zookeeper:
                replicaCount: 3
            
            # 备份配置
            backup:
              enabled: true
              schedule: "0 2 * * *"  # 每天凌晨2点备份
              retention: 7  # 保留7天
              destination: "s3://backup-bucket/milvus"
            
            # 备份脚本示例
            cat > backup.sh <<'EOF'
            #!/bin/bash
            
            BACKUP_DIR="/backup/milvus/$(date +%Y%m%d)"
            mkdir -p $BACKUP_DIR
            
            # 备份etcd
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot save /tmp/snapshot.db
            kubectl cp milvus/etcd-0:/tmp/snapshot.db $BACKUP_DIR/etcd-snapshot.db
            
            # 备份MinIO（使用mc工具）
            mc mirror milvus-minio/milvus-bucket $BACKUP_DIR/minio-data
            
            # 上传到S3
            aws s3 sync $BACKUP_DIR s3://backup-bucket/milvus/$(date +%Y%m%d)
            
            # 清理本地备份
            find /backup/milvus -type d -mtime +7 -exec rm -rf {} \;
            EOF
            ---
    b.灾难恢复
        a.功能说明
            制定灾难恢复计划，应对极端情况。定期备份数据和配置。测试恢复流程，确保可用。实现跨区域容灾，防止区域故障。配置监控告警，及时发现问题。文档化恢复步骤，快速响应。定期演练，提升恢复能力。
        b.代码示例
            ---
            # 灾难恢复操作指南
            
            # 1. 数据恢复流程
            
            # 步骤1: 停止Milvus服务
            helm uninstall milvus-release -n milvus
            
            # 步骤2: 恢复etcd数据
            # 从备份恢复etcd快照
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot restore /backup/etcd-snapshot.db \\
              --data-dir=/var/lib/etcd-restore
            
            # 步骤3: 恢复MinIO数据
            # 从S3恢复数据到MinIO
            aws s3 sync s3://backup-bucket/milvus/20240115/minio-data milvus-minio/milvus-bucket
            
            # 步骤4: 恢复Pulsar数据
            # Pulsar数据通常不需要恢复，因为是消息队列
            
            # 步骤5: 重新部署Milvus
            helm install milvus-release milvus/milvus -f values.yaml -n milvus
            
            # 步骤6: 验证数据完整性
            python3 <<'PYTHON'
            from pymilvus import connections, Collection, utility
            
            connections.connect(host="milvus.example.com", port="19530")
            
            # 检查Collection
            collections = utility.list_collections()
            print(f"Collections: {collections}")
            
            # 检查数据量
            for coll_name in collections:
                collection = Collection(coll_name)
                count = collection.num_entities
                print(f"{coll_name}: {count} entities")
            
            connections.disconnect("default")
            PYTHON
            
            # 2. 跨区域容灾配置
            
            # 主区域配置（values-primary.yaml）
            global:
              region: us-east-1
            
            minio:
              mode: distributed
              replicas: 4
              # 配置跨区域复制
              bucketReplication:
                enabled: true
                destination: "s3://milvus-backup-us-west-1"
            
            # 备区域配置（values-secondary.yaml）
            global:
              region: us-west-1
            
            # 配置为只读模式，从主区域同步数据
            readOnly: true
            
            # 3. 故障切换流程
            
            # 检测主区域故障
            # 切换DNS到备区域
            # 将备区域切换为读写模式
            # 验证服务可用性
            
            # 4. 恢复检查清单
            cat > recovery-checklist.md <<'EOF'
            # Milvus灾难恢复检查清单
            
            ## 恢复前
            - [ ] 确认备份可用
            - [ ] 评估数据丢失范围
            - [ ] 通知相关人员
            - [ ] 准备恢复环境
            
            ## 恢复中
            - [ ] 停止现有服务
            - [ ] 恢复etcd数据
            - [ ] 恢复MinIO数据
            - [ ] 重新部署Milvus
            - [ ] 验证组件状态
            
            ## 恢复后
            - [ ] 验证数据完整性
            - [ ] 测试查询功能
            - [ ] 测试写入功能
            - [ ] 监控系统状态
            - [ ] 通知恢复完成
            - [ ] 编写事故报告
            
            ## RTO/RPO目标
            - RTO (恢复时间目标): 2小时
            - RPO (恢复点目标): 24小时
            EOF
            ---

9.5 扩容缩容

01.手动扩缩容
    a.Worker扩容
        a.功能说明
            根据负载手动扩展Worker节点数量。Query Node扩容提升查询并发能力。Data Node扩容提升写入吞吐量。Index Node扩容加快索引构建速度。通过Helm或kubectl调整副本数。扩容后自动加入集群，无需重启。监控资源使用情况，及时扩容。
        b.代码示例
            ---
            # Worker节点手动扩容
            
            # 方法1: 使用Helm升级
            helm upgrade milvus-release milvus/milvus \\
              --set queryNode.replicas=5 \\
              --set dataNode.replicas=3 \\
              --set indexNode.replicas=3 \\
              --namespace milvus
            
            # 方法2: 使用kubectl scale
            kubectl scale deployment milvus-querynode --replicas=5 -n milvus
            kubectl scale deployment milvus-datanode --replicas=3 -n milvus
            kubectl scale deployment milvus-indexnode --replicas=3 -n milvus
            
            # 方法3: 修改values.yaml后重新部署
            cat > values-scale.yaml <<EOF
            queryNode:
              replicas: 5
              resources:
                limits:
                  cpu: "4"
                  memory: "16Gi"
            
            dataNode:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
            
            indexNode:
              replicas: 3
              resources:
                limits:
                  cpu: "4"
                  memory: "8Gi"
            EOF
            
            helm upgrade milvus-release milvus/milvus -f values-scale.yaml -n milvus
            
            # 验证扩容结果
            kubectl get pods -n milvus | grep -E "querynode|datanode|indexnode"
            
            # 监控新节点状态
            kubectl get pods -n milvus -w
            
            # 查看负载分布
            kubectl top pods -n milvus
            
            # 扩容建议：
            # - Query Node: 根据QPS需求扩容，每个节点支持1000-5000 QPS
            # - Data Node: 根据写入吞吐量扩容，每个节点支持10000-50000 vectors/s
            # - Index Node: 根据索引构建速度扩容，并行构建加快速度
            ---
    b.Worker缩容
        a.功能说明
            负载降低时缩减Worker节点数量，节省资源。缩容前确保有足够的剩余容量。Kubernetes会优雅地终止Pod。Query Node会先停止接收新请求，完成现有请求后退出。需要监控缩容后的系统负载。避免过度缩容导致性能下降。
        b.代码示例
            ---
            # Worker节点手动缩容
            
            # 缩容前检查当前负载
            kubectl top pods -n milvus
            
            # 查看当前副本数
            kubectl get deployment -n milvus
            
            # 缩容Query Node
            kubectl scale deployment milvus-querynode --replicas=3 -n milvus
            
            # 缩容Data Node
            kubectl scale deployment milvus-datanode --replicas=2 -n milvus
            
            # 缩容Index Node
            kubectl scale deployment milvus-indexnode --replicas=2 -n milvus
            
            # 或使用Helm
            helm upgrade milvus-release milvus/milvus \\
              --set queryNode.replicas=3 \\
              --set dataNode.replicas=2 \\
              --set indexNode.replicas=2 \\
              --namespace milvus
            
            # 监控缩容过程
            kubectl get pods -n milvus -w
            
            # 验证服务可用性
            python3 <<'PYTHON'
            from pymilvus import connections, Collection
            import numpy as np
            import time
            
            connections.connect(host="milvus.example.com", port="19530")
            
            collection = Collection("test_collection")
            
            # 测试查询
            query_vector = [[np.random.random() for _ in range(128)]]
            
            for i in range(10):
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10
                )
                latency = time.time() - start
                print(f"查询{i+1}: {latency*1000:.2f}ms")
            
            connections.disconnect("default")
            PYTHON
            
            # 缩容注意事项：
            # - 确保剩余容量足够
            # - 监控缩容后的性能
            # - 避免频繁缩容
            # - 保留最小副本数（至少2个）
            ---

02.自动扩缩容
    a.HPA配置
        a.功能说明
            Horizontal Pod Autoscaler根据指标自动扩缩容。支持基于CPU、内存、自定义指标扩缩容。设置最小和最大副本数。配置目标利用率阈值。自动调整副本数，无需人工干预。适合负载波动较大的场景。需要配置metrics-server。
        b.代码示例
            ---
            # HPA自动扩缩容配置
            
            # 1. 确保metrics-server已安装
            kubectl get deployment metrics-server -n kube-system
            
            # 2. 在values.yaml中启用HPA
            queryNode:
              replicas: 3
              resources:
                requests:
                  cpu: "2"
                  memory: "8Gi"
                limits:
                  cpu: "4"
                  memory: "16Gi"
              autoscaling:
                enabled: true
                minReplicas: 2
                maxReplicas: 10
                targetCPUUtilizationPercentage: 70
                targetMemoryUtilizationPercentage: 80
            
            dataNode:
              replicas: 2
              autoscaling:
                enabled: true
                minReplicas: 2
                maxReplicas: 5
                targetCPUUtilizationPercentage: 70
            
            indexNode:
              replicas: 2
              autoscaling:
                enabled: true
                minReplicas: 2
                maxReplicas: 5
                targetCPUUtilizationPercentage: 80
            
            # 3. 部署或更新
            helm upgrade milvus-release milvus/milvus -f values.yaml -n milvus
            
            # 4. 查看HPA状态
            kubectl get hpa -n milvus
            
            # 5. 查看HPA详细信息
            kubectl describe hpa milvus-querynode -n milvus
            
            # 6. 手动创建HPA（如果Helm不支持）
            cat <<EOF | kubectl apply -f -
            apiVersion: autoscaling/v2
            kind: HorizontalPodAutoscaler
            metadata:
              name: milvus-querynode-hpa
              namespace: milvus
            spec:
              scaleTargetRef:
                apiVersion: apps/v1
                kind: Deployment
                name: milvus-querynode
              minReplicas: 2
              maxReplicas: 10
              metrics:
              - type: Resource
                resource:
                  name: cpu
                  target:
                    type: Utilization
                    averageUtilization: 70
              - type: Resource
                resource:
                  name: memory
                  target:
                    type: Utilization
                    averageUtilization: 80
              behavior:
                scaleDown:
                  stabilizationWindowSeconds: 300
                  policies:
                  - type: Percent
                    value: 50
                    periodSeconds: 60
                scaleUp:
                  stabilizationWindowSeconds: 60
                  policies:
                  - type: Percent
                    value: 100
                    periodSeconds: 60
                  - type: Pods
                    value: 2
                    periodSeconds: 60
                  selectPolicy: Max
            EOF
            
            # 7. 监控HPA行为
            kubectl get hpa -n milvus -w
            
            # 8. 查看扩缩容事件
            kubectl get events -n milvus | grep -i "scaled"
            
            # HPA配置说明：
            # - minReplicas: 最小副本数
            # - maxReplicas: 最大副本数
            # - targetCPUUtilizationPercentage: CPU目标利用率
            # - targetMemoryUtilizationPercentage: 内存目标利用率
            # - stabilizationWindowSeconds: 稳定窗口，避免频繁扩缩容
            # - scaleDown/scaleUp policies: 扩缩容策略
            ---
    b.自定义指标
        a.功能说明
            除了CPU和内存，还可以基于自定义指标扩缩容。如QPS、查询延迟、队列长度等业务指标。需要安装Prometheus和Prometheus Adapter。定义自定义指标的计算规则。HPA根据自定义指标自动扩缩容。更贴近业务需求，扩缩容更精准。
        b.代码示例
            ---
            # 基于自定义指标的HPA配置
            
            # 1. 安装Prometheus和Prometheus Adapter
            helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
            helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
            
            helm repo add prometheus-adapter https://prometheus-community.github.io/helm-charts
            helm install prometheus-adapter prometheus-adapter/prometheus-adapter -n monitoring
            
            # 2. 配置Prometheus Adapter自定义指标
            cat > prometheus-adapter-values.yaml <<EOF
            rules:
              custom:
              - seriesQuery: 'milvus_query_qps{namespace="milvus"}'
                resources:
                  overrides:
                    namespace: {resource: "namespace"}
                    pod: {resource: "pod"}
                name:
                  matches: "^(.*)_qps"
                  as: "milvus_query_qps"
                metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
              
              - seriesQuery: 'milvus_query_latency_ms{namespace="milvus"}'
                resources:
                  overrides:
                    namespace: {resource: "namespace"}
                    pod: {resource: "pod"}
                name:
                  matches: "^(.*)_latency_ms"
                  as: "milvus_query_latency"
                metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
            EOF
            
            helm upgrade prometheus-adapter prometheus-adapter/prometheus-adapter \\
              -f prometheus-adapter-values.yaml -n monitoring
            
            # 3. 验证自定义指标
            kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
            
            # 4. 创建基于自定义指标的HPA
            cat <<EOF | kubectl apply -f -
            apiVersion: autoscaling/v2
            kind: HorizontalPodAutoscaler
            metadata:
              name: milvus-querynode-custom-hpa
              namespace: milvus
            spec:
              scaleTargetRef:
                apiVersion: apps/v1
                kind: Deployment
                name: milvus-querynode
              minReplicas: 2
              maxReplicas: 10
              metrics:
              # 基于QPS扩缩容
              - type: Pods
                pods:
                  metric:
                    name: milvus_query_qps
                  target:
                    type: AverageValue
                    averageValue: "1000"  # 每个Pod处理1000 QPS
              # 基于查询延迟扩缩容
              - type: Pods
                pods:
                  metric:
                    name: milvus_query_latency
                  target:
                    type: AverageValue
                    averageValue: "50"  # 平均延迟50ms
              behavior:
                scaleDown:
                  stabilizationWindowSeconds: 300
                scaleUp:
                  stabilizationWindowSeconds: 60
            EOF
            
            # 5. 监控自定义指标HPA
            kubectl get hpa milvus-querynode-custom-hpa -n milvus -w
            
            # 6. 查看指标值
            kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/milvus/pods/*/milvus_query_qps" | jq .
            
            # 自定义指标示例：
            # - QPS: 每秒查询数
            # - 查询延迟: 平均查询延迟
            # - 队列长度: 待处理请求队列长度
            # - 错误率: 查询错误率
            # - 资源使用率: GPU使用率等
            
            # 测试自动扩缩容
            python3 <<'PYTHON'
            from pymilvus import connections, Collection
            import numpy as np
            import time
            import threading
            
            connections.connect(host="milvus.example.com", port="19530")
            collection = Collection("test_collection")
            
            def query_worker():
                """持续查询，触发扩容"""
                query_vector = [[np.random.random() for _ in range(128)]]
                
                while True:
                    try:
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": 16}},
                            limit=10
                        )
                    except:
                        pass
                    time.sleep(0.001)  # 高频查询
            
            # 启动多个线程模拟高负载
            threads = []
            for i in range(50):
                t = threading.Thread(target=query_worker, daemon=True)
                t.start()
                threads.append(t)
            
            print("高负载测试运行中，观察HPA扩容...")
            print("kubectl get hpa -n milvus -w")
            
            time.sleep(300)  # 运行5分钟
            PYTHON
            ---

10 AI框架集成

10.1 LangChain集成

01.基础集成
    a.安装配置
        a.功能说明
            LangChain是流行的LLM应用开发框架。Milvus作为向量存储后端与LangChain无缝集成。支持文档加载、分割、嵌入、检索等完整流程。提供MilvusVectorStore类封装Milvus操作。支持相似度搜索和MMR检索。可以与LLM结合实现RAG应用。安装langchain和pymilvus即可使用。
        b.代码示例
            ---
            # 安装依赖
            # pip install langchain langchain-community pymilvus openai
            
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.text_splitter import RecursiveCharacterTextSplitter
            from langchain_community.document_loaders import TextLoader
            
            # 1. 加载文档
            loader = TextLoader("document.txt")
            documents = loader.load()
            
            # 2. 分割文档
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=1000,
                chunk_overlap=200
            )
            docs = text_splitter.split_documents(documents)
            
            # 3. 创建嵌入模型
            embeddings = OpenAIEmbeddings()
            
            # 4. 创建Milvus向量存储
            vector_store = Milvus.from_documents(
                docs,
                embeddings,
                connection_args={
                    "host": "localhost",
                    "port": "19530"
                },
                collection_name="langchain_docs",
                drop_old=True
            )
            
            # 5. 相似度搜索
            query = "What is machine learning?"
            results = vector_store.similarity_search(query, k=3)
            
            for i, doc in enumerate(results):
                print(f"\n结果 {i+1}:")
                print(f"内容: {doc.page_content[:200]}...")
                print(f"元数据: {doc.metadata}")
            
            # 6. 带分数的搜索
            results_with_scores = vector_store.similarity_search_with_score(query, k=3)
            
            for doc, score in results_with_scores:
                print(f"\n分数: {score}")
                print(f"内容: {doc.page_content[:200]}...")
            
            # 7. MMR检索（最大边际相关性）
            mmr_results = vector_store.max_marginal_relevance_search(
                query,
                k=3,
                fetch_k=10
            )
            
            print(f"\nMMR检索结果: {len(mmr_results)}个")
            ---
    b.检索器配置
        a.功能说明
            LangChain提供Retriever抽象，统一检索接口。Milvus可以转换为Retriever使用。支持多种检索模式：相似度、MMR、阈值过滤。可以配置检索参数，如top-k、score阈值。Retriever可以与LLM链式组合。实现问答、摘要等应用。支持自定义检索逻辑。
        b.代码示例
            ---
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.chains import RetrievalQA
            from langchain_community.llms import OpenAI
            
            # 创建向量存储
            embeddings = OpenAIEmbeddings()
            vector_store = Milvus(
                embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="langchain_docs"
            )
            
            # 1. 转换为Retriever（相似度模式）
            retriever = vector_store.as_retriever(
                search_type="similarity",
                search_kwargs={"k": 3}
            )
            
            # 测试检索
            docs = retriever.get_relevant_documents("What is deep learning?")
            print(f"检索到 {len(docs)} 个文档")
            
            # 2. MMR模式Retriever
            mmr_retriever = vector_store.as_retriever(
                search_type="mmr",
                search_kwargs={
                    "k": 3,
                    "fetch_k": 10,
                    "lambda_mult": 0.5
                }
            )
            
            # 3. 阈值过滤Retriever
            threshold_retriever = vector_store.as_retriever(
                search_type="similarity_score_threshold",
                search_kwargs={
                    "score_threshold": 0.8,
                    "k": 5
                }
            )
            
            # 4. 与LLM结合使用
            llm = OpenAI(temperature=0)
            
            qa_chain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=retriever,
                return_source_documents=True
            )
            
            # 执行问答
            query = "What are the main types of machine learning?"
            result = qa_chain({"query": query})
            
            print(f"\n问题: {query}")
            print(f"\n答案: {result['result']}")
            print(f"\n来源文档数: {len(result['source_documents'])}")
            
            for i, doc in enumerate(result['source_documents']):
                print(f"\n来源 {i+1}:")
                print(doc.page_content[:200])
            ---

02.RAG应用
    a.问答系统
        a.功能说明
            基于检索增强生成RAG构建问答系统。Milvus存储知识库向量。用户提问时检索相关文档。将文档作为上下文传给LLM生成答案。支持多种链类型：stuff、map_reduce、refine。可以自定义提示词模板。实现引用来源，提升可信度。支持流式输出。
        b.代码示例
            ---
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.chains import RetrievalQA
            from langchain_community.llms import OpenAI
            from langchain.prompts import PromptTemplate
            
            # 创建向量存储
            embeddings = OpenAIEmbeddings()
            vector_store = Milvus(
                embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="knowledge_base"
            )
            
            # 自定义提示词模板
            prompt_template = """使用以下上下文回答问题。如果不知道答案，就说不知道，不要编造答案。
            
            上下文:
            {context}
            
            问题: {question}
            
            答案:"""
            
            PROMPT = PromptTemplate(
                template=prompt_template,
                input_variables=["context", "question"]
            )
            
            # 创建QA链
            llm = OpenAI(temperature=0)
            qa_chain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=vector_store.as_retriever(search_kwargs={"k": 3}),
                return_source_documents=True,
                chain_type_kwargs={"prompt": PROMPT}
            )
            
            # 问答示例
            questions = [
                "What is the capital of France?",
                "Explain quantum computing in simple terms.",
                "What are the benefits of exercise?"
            ]
            
            for query in questions:
                result = qa_chain({"query": query})
                
                print(f"\n{'='*60}")
                print(f"问题: {query}")
                print(f"\n答案: {result['result']}")
                
                print(f"\n参考来源:")
                for i, doc in enumerate(result['source_documents']):
                    print(f"\n[{i+1}] {doc.metadata.get('source', 'Unknown')}")
                    print(f"    {doc.page_content[:150]}...")
            
            # 使用map_reduce处理长文档
            qa_chain_mr = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="map_reduce",
                retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
                return_source_documents=True
            )
            
            result = qa_chain_mr({"query": "Summarize the main points about AI safety."})
            print(f"\n摘要: {result['result']}")
            ---
    b.对话系统
        a.功能说明
            构建带记忆的对话系统。使用ConversationalRetrievalChain实现多轮对话。Milvus存储知识库，LLM生成回复。支持对话历史管理。可以根据历史优化检索。实现上下文感知的回答。支持流式对话。可以集成聊天界面。
        b.代码示例
            ---
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.chains import ConversationalRetrievalChain
            from langchain_community.llms import OpenAI
            from langchain.memory import ConversationBufferMemory
            
            # 创建向量存储
            embeddings = OpenAIEmbeddings()
            vector_store = Milvus(
                embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="chat_knowledge"
            )
            
            # 创建对话记忆
            memory = ConversationBufferMemory(
                memory_key="chat_history",
                return_messages=True,
                output_key="answer"
            )
            
            # 创建对话链
            llm = OpenAI(temperature=0.7)
            conversation_chain = ConversationalRetrievalChain.from_llm(
                llm=llm,
                retriever=vector_store.as_retriever(search_kwargs={"k": 3}),
                memory=memory,
                return_source_documents=True
            )
            
            # 多轮对话示例
            print("对话系统启动（输入'quit'退出）\n")
            
            while True:
                query = input("用户: ")
                if query.lower() == 'quit':
                    break
                
                result = conversation_chain({"question": query})
                
                print(f"\n助手: {result['answer']}\n")
                
                if result.get('source_documents'):
                    print("参考来源:")
                    for i, doc in enumerate(result['source_documents'][:2]):
                        print(f"  [{i+1}] {doc.page_content[:100]}...")
                    print()
            
            # 对话示例脚本
            demo_questions = [
                "What is machine learning?",
                "Can you give me an example?",
                "How does it differ from traditional programming?",
                "What are some applications?"
            ]
            
            print("\n对话演示:\n")
            for query in demo_questions:
                result = conversation_chain({"question": query})
                print(f"用户: {query}")
                print(f"助手: {result['answer']}\n")
            
            # 查看对话历史
            print("\n对话历史:")
            print(memory.load_memory_variables({}))
            ---

10.2 LlamaIndex集成

01.索引构建
    a.向量索引
        a.功能说明
            LlamaIndex（原GPT Index）是数据框架，用于LLM应用。Milvus作为向量存储后端与LlamaIndex集成。支持构建向量索引，存储文档嵌入。提供MilvusVectorStore类封装操作。支持文档加载、索引、查询完整流程。可以与多种LLM配合使用。实现高效的文档检索和问答。
        b.代码示例
            ---
            # 安装依赖
            # pip install llama-index llama-index-vector-stores-milvus pymilvus
            
            from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.core import Settings
            from llama_index.embeddings.openai import OpenAIEmbedding
            from llama_index.llms.openai import OpenAI
            
            # 1. 配置全局设置
            Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
            Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
            
            # 2. 加载文档
            documents = SimpleDirectoryReader("./data").load_data()
            print(f"加载了 {len(documents)} 个文档")
            
            # 3. 创建Milvus向量存储
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                dim=1536,  # OpenAI embedding维度
                collection_name="llamaindex_docs",
                overwrite=True
            )
            
            # 4. 创建存储上下文
            storage_context = StorageContext.from_defaults(
                vector_store=vector_store
            )
            
            # 5. 构建索引
            index = VectorStoreIndex.from_documents(
                documents,
                storage_context=storage_context,
                show_progress=True
            )
            
            print("索引构建完成！")
            
            # 6. 查询索引
            query_engine = index.as_query_engine(
                similarity_top_k=3
            )
            
            response = query_engine.query("What is the main topic of these documents?")
            print(f"\n查询: What is the main topic of these documents?")
            print(f"回答: {response}")
            
            # 7. 流式查询
            streaming_response = query_engine.query("Explain the key concepts.")
            for text in streaming_response.response_gen:
                print(text, end="", flush=True)
            print()
            
            # 8. 加载已有索引
            # 后续使用时无需重新构建
            vector_store_existing = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="llamaindex_docs"
            )
            
            storage_context_existing = StorageContext.from_defaults(
                vector_store=vector_store_existing
            )
            
            index_loaded = VectorStoreIndex.from_vector_store(
                vector_store_existing,
                storage_context=storage_context_existing
            )
            
            query_engine_loaded = index_loaded.as_query_engine()
            response = query_engine_loaded.query("Summarize the content.")
            print(f"\n从已有索引查询: {response}")
            ---
    b.混合索引
        a.功能说明
            LlamaIndex支持多种索引类型组合。可以结合向量索引和关键词索引。实现混合检索，提升准确率。支持自定义检索策略。可以配置不同索引的权重。实现多模态检索。支持图索引、树索引等高级结构。
        b.代码示例
            ---
            from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.core.indices.composability import ComposableGraph
            from llama_index.core import SummaryIndex
            from llama_index.core.tools import QueryEngineTool
            from llama_index.core.query_engine import RouterQueryEngine
            from llama_index.core.selectors import LLMSingleSelector
            
            # 加载文档
            documents = SimpleDirectoryReader("./data").load_data()
            
            # 1. 创建向量索引
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="hybrid_index",
                dim=1536
            )
            
            storage_context = StorageContext.from_defaults(
                vector_store=vector_store
            )
            
            vector_index = VectorStoreIndex.from_documents(
                documents,
                storage_context=storage_context
            )
            
            # 2. 创建摘要索引
            summary_index = SummaryIndex.from_documents(documents)
            
            # 3. 创建查询引擎工具
            vector_tool = QueryEngineTool.from_defaults(
                query_engine=vector_index.as_query_engine(),
                description="用于回答关于文档具体细节的问题"
            )
            
            summary_tool = QueryEngineTool.from_defaults(
                query_engine=summary_index.as_query_engine(),
                description="用于回答需要整体理解文档的问题"
            )
            
            # 4. 创建路由查询引擎
            router_query_engine = RouterQueryEngine(
                selector=LLMSingleSelector.from_defaults(),
                query_engine_tools=[vector_tool, summary_tool]
            )
            
            # 5. 使用路由查询
            response1 = router_query_engine.query(
                "What is the specific definition of machine learning mentioned in the document?"
            )
            print(f"细节问题: {response1}")
            
            response2 = router_query_engine.query(
                "What is the overall theme of these documents?"
            )
            print(f"整体问题: {response2}")
            
            # 6. 自定义混合检索
            from llama_index.core.retrievers import VectorIndexRetriever
            from llama_index.core.query_engine import RetrieverQueryEngine
            
            retriever = VectorIndexRetriever(
                index=vector_index,
                similarity_top_k=5
            )
            
            query_engine = RetrieverQueryEngine.from_args(
                retriever=retriever,
                response_mode="tree_summarize"
            )
            
            response = query_engine.query("Explain the main concepts.")
            print(f"混合检索结果: {response}")
            ---

02.查询优化
    a.高级查询
        a.功能说明
            LlamaIndex提供多种高级查询模式。支持子问题查询，分解复杂问题。实现多步推理，逐步求解。支持假设性文档嵌入HyDE。可以配置响应合成模式。实现引用追踪，提供来源。支持流式响应。可以自定义查询转换。
        b.代码示例
            ---
            from llama_index.core import VectorStoreIndex
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.core.query_engine import SubQuestionQueryEngine
            from llama_index.core.tools import QueryEngineTool, ToolMetadata
            from llama_index.core.response.notebook_utils import display_response
            
            # 加载索引
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="advanced_query"
            )
            
            index = VectorStoreIndex.from_vector_store(vector_store)
            
            # 1. 子问题查询引擎
            # 将复杂问题分解为多个子问题
            query_engine_tools = [
                QueryEngineTool(
                    query_engine=index.as_query_engine(),
                    metadata=ToolMetadata(
                        name="document_index",
                        description="包含文档的详细信息"
                    )
                )
            ]
            
            sub_question_engine = SubQuestionQueryEngine.from_defaults(
                query_engine_tools=query_engine_tools
            )
            
            response = sub_question_engine.query(
                "Compare and contrast the advantages and disadvantages of different machine learning approaches."
            )
            print(f"子问题查询: {response}")
            
            # 2. 配置响应模式
            # compact: 紧凑模式，合并文本块
            query_engine_compact = index.as_query_engine(
                response_mode="compact",
                similarity_top_k=5
            )
            
            # tree_summarize: 树形摘要，层次化处理
            query_engine_tree = index.as_query_engine(
                response_mode="tree_summarize",
                similarity_top_k=5
            )
            
            # refine: 精炼模式，迭代优化答案
            query_engine_refine = index.as_query_engine(
                response_mode="refine",
                similarity_top_k=5
            )
            
            query = "What are the key principles of effective learning?"
            
            response_compact = query_engine_compact.query(query)
            response_tree = query_engine_tree.query(query)
            response_refine = query_engine_refine.query(query)
            
            print(f"\nCompact模式: {response_compact}")
            print(f"\nTree模式: {response_tree}")
            print(f"\nRefine模式: {response_refine}")
            
            # 3. 流式响应
            streaming_engine = index.as_query_engine(
                streaming=True
            )
            
            streaming_response = streaming_engine.query("Explain neural networks.")
            print("\n流式响应:")
            for text in streaming_response.response_gen:
                print(text, end="", flush=True)
            print()
            
            # 4. 带元数据过滤的查询
            from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
            
            filters = MetadataFilters(
                filters=[
                    ExactMatchFilter(key="category", value="machine_learning")
                ]
            )
            
            filtered_engine = index.as_query_engine(
                filters=filters,
                similarity_top_k=3
            )
            
            response = filtered_engine.query("What is supervised learning?")
            print(f"\n过滤查询: {response}")
            
            # 5. 查看来源节点
            response_with_sources = index.as_query_engine(
                response_mode="compact"
            ).query("What is deep learning?")
            
            print(f"\n回答: {response_with_sources}")
            print(f"\n来源节点:")
            for i, node in enumerate(response_with_sources.source_nodes):
                print(f"\n[{i+1}] 分数: {node.score:.4f}")
                print(f"    内容: {node.text[:200]}...")
                print(f"    元数据: {node.metadata}")
            ---
    b.Agent应用
        a.功能说明
            LlamaIndex支持构建Agent应用。Agent可以使用多种工具完成任务。Milvus作为知识库工具之一。Agent根据问题选择合适的工具。实现多步推理和规划。支持工具组合使用。可以自定义工具和策略。实现复杂的AI应用。
        b.代码示例
            ---
            from llama_index.core.agent import ReActAgent
            from llama_index.core.tools import QueryEngineTool, ToolMetadata, FunctionTool
            from llama_index.core import VectorStoreIndex
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.llms.openai import OpenAI
            
            # 1. 创建知识库工具
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="agent_knowledge"
            )
            
            index = VectorStoreIndex.from_vector_store(vector_store)
            
            knowledge_tool = QueryEngineTool(
                query_engine=index.as_query_engine(),
                metadata=ToolMetadata(
                    name="knowledge_base",
                    description="包含公司文档、产品信息、技术文档的知识库"
                )
            )
            
            # 2. 创建自定义函数工具
            def calculate(expression: str) -> str:
                """计算数学表达式"""
                try:
                    result = eval(expression)
                    return f"计算结果: {result}"
                except:
                    return "计算错误"
            
            calc_tool = FunctionTool.from_defaults(fn=calculate)
            
            def search_web(query: str) -> str:
                """搜索网络信息"""
                # 实际应用中调用搜索API
                return f"网络搜索结果: {query}"
            
            web_tool = FunctionTool.from_defaults(fn=search_web)
            
            # 3. 创建ReAct Agent
            llm = OpenAI(model="gpt-4", temperature=0)
            
            agent = ReActAgent.from_tools(
                tools=[knowledge_tool, calc_tool, web_tool],
                llm=llm,
                verbose=True
            )
            
            # 4. 使用Agent
            response1 = agent.chat("What is our company's return policy?")
            print(f"Agent回答: {response1}")
            
            response2 = agent.chat("Calculate 15% discount on $299")
            print(f"Agent回答: {response2}")
            
            response3 = agent.chat(
                "Find information about the latest AI trends and compare with our product features"
            )
            print(f"Agent回答: {response3}")
            
            # 5. 多轮对话
            print("\nAgent对话模式（输入'quit'退出）:")
            
            while True:
                user_input = input("\n用户: ")
                if user_input.lower() == 'quit':
                    break
                
                response = agent.chat(user_input)
                print(f"Agent: {response}")
            
            # 6. 查看Agent推理过程
            response_with_reasoning = agent.chat(
                "What are the key features of our product and how much would it cost with a 20% discount?"
            )
            
            print(f"\n最终回答: {response_with_reasoning}")
            print(f"\n推理步骤:")
            for step in agent.chat_history:
                print(f"  - {step}")
            ---

10.3 Haystack集成

01.Pipeline构建
    a.文档处理
        a.功能说明
            Haystack是端到端NLP框架，用于构建搜索和问答系统。Milvus作为文档存储后端与Haystack集成。支持文档索引、检索、问答完整流程。提供MilvusDocumentStore类封装操作。支持Pipeline模式，组合多个组件。可以与多种Reader和Retriever配合。实现生产级NLP应用。
        b.代码示例
            ---
            # 安装依赖
            # pip install farm-haystack[milvus] pymilvus
            
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import PreProcessor, EmbeddingRetriever
            from haystack.utils import convert_files_to_docs
            
            # 1. 创建Milvus文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="haystack_docs",
                embedding_dim=768,
                similarity="cosine",
                recreate_index=True
            )
            
            # 2. 加载文档
            docs = convert_files_to_docs(
                dir_path="./data",
                clean_func=None,
                split_paragraphs=True
            )
            
            print(f"加载了 {len(docs)} 个文档")
            
            # 3. 预处理文档
            preprocessor = PreProcessor(
                clean_empty_lines=True,
                clean_whitespace=True,
                clean_header_footer=True,
                split_by="word",
                split_length=200,
                split_overlap=20,
                split_respect_sentence_boundary=True
            )
            
            processed_docs = preprocessor.process(docs)
            print(f"预处理后: {len(processed_docs)} 个文档片段")
            
            # 4. 写入文档存储
            document_store.write_documents(processed_docs)
            print("文档已写入Milvus")
            
            # 5. 创建嵌入检索器
            retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2",
                model_format="sentence_transformers"
            )
            
            # 6. 更新文档嵌入
            document_store.update_embeddings(retriever)
            print("文档嵌入已更新")
            
            # 7. 检索文档
            query = "What is machine learning?"
            retrieved_docs = retriever.retrieve(
                query=query,
                top_k=3
            )
            
            print(f"\n查询: {query}")
            print(f"检索到 {len(retrieved_docs)} 个文档:\n")
            
            for i, doc in enumerate(retrieved_docs):
                print(f"[{i+1}] 分数: {doc.score:.4f}")
                print(f"    内容: {doc.content[:200]}...")
                print(f"    元数据: {doc.meta}\n")
            ---
    b.Pipeline组装
        a.功能说明
            Haystack使用Pipeline模式组装NLP应用。Pipeline由多个节点组成，数据在节点间流动。支持检索、阅读、生成等多种节点。可以自定义节点和连接方式。实现复杂的处理流程。支持并行和条件分支。可以保存和加载Pipeline。
        b.代码示例
            ---
            from haystack import Pipeline
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import EmbeddingRetriever, FARMReader, PromptNode
            from haystack.nodes import AnswerParser, PromptTemplate
            
            # 1. 创建文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="haystack_pipeline",
                embedding_dim=768
            )
            
            # 2. 创建检索器
            retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2"
            )
            
            # 3. 创建阅读器
            reader = FARMReader(
                model_name_or_path="deepset/roberta-base-squad2",
                use_gpu=True
            )
            
            # 4. 构建检索式问答Pipeline
            retrieval_qa_pipeline = Pipeline()
            retrieval_qa_pipeline.add_node(
                component=retriever,
                name="Retriever",
                inputs=["Query"]
            )
            retrieval_qa_pipeline.add_node(
                component=reader,
                name="Reader",
                inputs=["Retriever"]
            )
            
            # 5. 运行Pipeline
            query = "What are the main types of machine learning?"
            
            result = retrieval_qa_pipeline.run(
                query=query,
                params={
                    "Retriever": {"top_k": 5},
                    "Reader": {"top_k": 3}
                }
            )
            
            print(f"问题: {query}\n")
            print("答案:")
            for i, answer in enumerate(result["answers"]):
                print(f"\n[{i+1}] 答案: {answer.answer}")
                print(f"    分数: {answer.score:.4f}")
                print(f"    上下文: {answer.context[:200]}...")
            
            # 6. 构建生成式问答Pipeline（使用LLM）
            prompt_template = PromptTemplate(
                prompt="""根据以下上下文回答问题。
                
                上下文: {join(documents)}
                
                问题: {query}
                
                答案:""",
                output_parser=AnswerParser()
            )
            
            prompt_node = PromptNode(
                model_name_or_path="gpt-3.5-turbo",
                api_key="your-api-key",
                default_prompt_template=prompt_template
            )
            
            generative_qa_pipeline = Pipeline()
            generative_qa_pipeline.add_node(
                component=retriever,
                name="Retriever",
                inputs=["Query"]
            )
            generative_qa_pipeline.add_node(
                component=prompt_node,
                name="PromptNode",
                inputs=["Retriever"]
            )
            
            # 7. 运行生成式Pipeline
            result_gen = generative_qa_pipeline.run(
                query=query,
                params={"Retriever": {"top_k": 3}}
            )
            
            print(f"\n生成式答案: {result_gen['answers'][0].answer}")
            
            # 8. 保存和加载Pipeline
            retrieval_qa_pipeline.save_to_yaml("qa_pipeline.yaml")
            
            # 加载Pipeline
            loaded_pipeline = Pipeline.load_from_yaml("qa_pipeline.yaml")
            
            # 9. 批量查询
            queries = [
                "What is supervised learning?",
                "Explain neural networks.",
                "What is the difference between AI and ML?"
            ]
            
            for q in queries:
                result = retrieval_qa_pipeline.run(
                    query=q,
                    params={"Retriever": {"top_k": 3}, "Reader": {"top_k": 1}}
                )
                print(f"\n问题: {q}")
                print(f"答案: {result['answers'][0].answer if result['answers'] else '未找到答案'}")
            ---

02.高级应用
    a.多模态检索
        a.功能说明
            Haystack支持多模态文档处理。可以处理文本、表格、图片等多种格式。Milvus存储多模态嵌入。支持跨模态检索。可以提取PDF、Word等文件内容。实现文档理解和问答。支持OCR和图像理解。构建企业级文档搜索系统。
        b.代码示例
            ---
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import (
                PDFToTextConverter,
                PreProcessor,
                EmbeddingRetriever,
                TableTextRetriever
            )
            from haystack import Pipeline
            
            # 1. 创建文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="multimodal_docs",
                embedding_dim=768
            )
            
            # 2. 创建PDF转换器
            pdf_converter = PDFToTextConverter(
                remove_numeric_tables=False,
                valid_languages=["en", "zh"]
            )
            
            # 3. 转换PDF文档
            pdf_docs = pdf_converter.convert(
                file_path="document.pdf",
                meta={"source": "document.pdf"}
            )
            
            # 4. 预处理
            preprocessor = PreProcessor(
                split_by="word",
                split_length=200,
                split_overlap=20
            )
            
            processed_docs = preprocessor.process(pdf_docs)
            
            # 5. 写入文档存储
            document_store.write_documents(processed_docs)
            
            # 6. 创建检索器
            retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2"
            )
            
            document_store.update_embeddings(retriever)
            
            # 7. 表格检索
            table_retriever = TableTextRetriever(
                document_store=document_store,
                embedding_model="deepset/all-mpnet-base-v2-table"
            )
            
            # 8. 构建多模态检索Pipeline
            multimodal_pipeline = Pipeline()
            multimodal_pipeline.add_node(
                component=retriever,
                name="TextRetriever",
                inputs=["Query"]
            )
            multimodal_pipeline.add_node(
                component=table_retriever,
                name="TableRetriever",
                inputs=["Query"]
            )
            
            # 9. 查询
            query = "What are the sales figures for Q3?"
            
            result = multimodal_pipeline.run(
                query=query,
                params={
                    "TextRetriever": {"top_k": 3},
                    "TableRetriever": {"top_k": 2}
                }
            )
            
            print(f"查询: {query}\n")
            print("文本结果:")
            for doc in result.get("documents", []):
                if doc.content_type == "text":
                    print(f"  - {doc.content[:200]}...")
            
            print("\n表格结果:")
            for doc in result.get("documents", []):
                if doc.content_type == "table":
                    print(f"  - {doc.content}")
            ---
    b.语义搜索
        a.功能说明
            基于Milvus和Haystack构建语义搜索系统。支持自然语言查询。理解查询意图，返回语义相关结果。可以处理同义词、多语言查询。支持过滤和排序。实现个性化搜索。可以集成到网站或应用。提供API接口。
        b.代码示例
            ---
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import EmbeddingRetriever, BM25Retriever
            from haystack import Pipeline
            from haystack.nodes import JoinDocuments
            from flask import Flask, request, jsonify
            
            # 1. 创建文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="semantic_search",
                embedding_dim=768,
                similarity="cosine"
            )
            
            # 2. 创建混合检索器
            # 语义检索
            embedding_retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2"
            )
            
            # 关键词检索
            bm25_retriever = BM25Retriever(document_store=document_store)
            
            # 3. 构建混合检索Pipeline
            join_documents = JoinDocuments(
                join_mode="concatenate"
            )
            
            hybrid_pipeline = Pipeline()
            hybrid_pipeline.add_node(
                component=embedding_retriever,
                name="EmbeddingRetriever",
                inputs=["Query"]
            )
            hybrid_pipeline.add_node(
                component=bm25_retriever,
                name="BM25Retriever",
                inputs=["Query"]
            )
            hybrid_pipeline.add_node(
                component=join_documents,
                name="JoinDocuments",
                inputs=["EmbeddingRetriever", "BM25Retriever"]
            )
            
            # 4. 创建搜索API
            app = Flask(__name__)
            
            @app.route("/search", methods=["POST"])
            def search():
                data = request.json
                query = data.get("query", "")
                top_k = data.get("top_k", 5)
                filters = data.get("filters", {})
                
                result = hybrid_pipeline.run(
                    query=query,
                    params={
                        "EmbeddingRetriever": {
                            "top_k": top_k,
                            "filters": filters
                        },
                        "BM25Retriever": {
                            "top_k": top_k,
                            "filters": filters
                        }
                    }
                )
                
                documents = result.get("documents", [])
                
                response = {
                    "query": query,
                    "total": len(documents),
                    "results": [
                        {
                            "id": doc.id,
                            "content": doc.content,
                            "score": doc.score,
                            "meta": doc.meta
                        }
                        for doc in documents[:top_k]
                    ]
                }
                
                return jsonify(response)
            
            @app.route("/index", methods=["POST"])
            def index_documents():
                data = request.json
                documents = data.get("documents", [])
                
                document_store.write_documents(documents)
                document_store.update_embeddings(embedding_retriever)
                
                return jsonify({
                    "status": "success",
                    "indexed": len(documents)
                })
            
            # 5. 启动API服务
            # app.run(host="0.0.0.0", port=8000)
            
            # 6. 测试搜索
            test_queries = [
                "machine learning algorithms",
                "deep neural networks",
                "natural language processing"
            ]
            
            for query in test_queries:
                result = hybrid_pipeline.run(
                    query=query,
                    params={
                        "EmbeddingRetriever": {"top_k": 3},
                        "BM25Retriever": {"top_k": 3}
                    }
                )
                
                print(f"\n查询: {query}")
                print(f"结果数: {len(result['documents'])}")
                
                for i, doc in enumerate(result["documents"][:3]):
                    print(f"\n[{i+1}] 分数: {doc.score:.4f}")
                    print(f"    内容: {doc.content[:150]}...")
            
            # 7. 带过滤的搜索
            filtered_result = hybrid_pipeline.run(
                query="machine learning",
                params={
                    "EmbeddingRetriever": {
                        "top_k": 5,
                        "filters": {"category": ["AI", "ML"]}
                    },
                    "BM25Retriever": {
                        "top_k": 5,
                        "filters": {"category": ["AI", "ML"]}
                    }
                }
            )
            
            print(f"\n过滤搜索结果: {len(filtered_result['documents'])} 个文档")
            ---

11 运维监控

11.1 监控指标

01.系统指标
    a.性能指标
        a.功能说明
            Milvus提供丰富的性能监控指标。包括QPS、延迟、吞吐量等核心指标。监控CPU、内存、磁盘、网络使用情况。跟踪查询性能和索引构建进度。支持Prometheus格式导出指标。可以集成Grafana可视化。实时监控系统健康状态。设置告警阈值及时发现问题。
        b.代码示例
            ---
            # Milvus性能指标监控配置
            
            # 1. 启用Prometheus指标导出
            # 在milvus.yaml中配置
            metrics:
              enabled: true
              port: 9091
              path: /metrics
            
            # 2. 访问指标端点
            # curl http://localhost:9091/metrics
            
            # 3. 主要性能指标
            performance_metrics = {
                "查询性能": {
                    "milvus_query_qps": "每秒查询数",
                    "milvus_query_latency_ms": "查询延迟（毫秒）",
                    "milvus_query_success_rate": "查询成功率",
                    "milvus_query_timeout_count": "查询超时次数"
                },
                "写入性能": {
                    "milvus_insert_qps": "每秒插入数",
                    "milvus_insert_latency_ms": "插入延迟（毫秒）",
                    "milvus_insert_success_rate": "插入成功率",
                    "milvus_flush_duration_ms": "刷盘耗时"
                },
                "索引性能": {
                    "milvus_index_build_duration_ms": "索引构建耗时",
                    "milvus_index_build_progress": "索引构建进度",
                    "milvus_index_size_bytes": "索引大小（字节）"
                },
                "系统资源": {
                    "milvus_cpu_usage_percent": "CPU使用率",
                    "milvus_memory_usage_bytes": "内存使用量",
                    "milvus_disk_usage_bytes": "磁盘使用量",
                    "milvus_network_io_bytes": "网络IO"
                }
            }
            
            # 4. Prometheus配置
            prometheus_config = """
            global:
              scrape_interval: 15s
              evaluation_interval: 15s
            
            scrape_configs:
              - job_name: 'milvus'
                static_configs:
                  - targets: ['localhost:9091']
                    labels:
                      instance: 'milvus-standalone'
                      cluster: 'production'
            """
            
            # 5. 使用Python查询指标
            import requests
            
            def get_milvus_metrics():
                response = requests.get("http://localhost:9091/metrics")
                metrics = {}
                
                for line in response.text.split('\n'):
                    if line.startswith('milvus_') and not line.startswith('#'):
                        parts = line.split()
                        if len(parts) >= 2:
                            metric_name = parts[0].split('{')[0]
                            metric_value = float(parts[-1])
                            metrics[metric_name] = metric_value
                
                return metrics
            
            # 获取当前指标
            metrics = get_milvus_metrics()
            
            print("Milvus性能指标:")
            print(f"  QPS: {metrics.get('milvus_query_qps', 0):.2f}")
            print(f"  平均延迟: {metrics.get('milvus_query_latency_ms', 0):.2f}ms")
            print(f"  CPU使用率: {metrics.get('milvus_cpu_usage_percent', 0):.2f}%")
            print(f"  内存使用: {metrics.get('milvus_memory_usage_bytes', 0) / 1024**3:.2f}GB")
            
            # 6. PromQL查询示例
            promql_queries = {
                "平均QPS（5分钟）": "rate(milvus_query_total[5m])",
                "P99延迟": "histogram_quantile(0.99, rate(milvus_query_latency_ms_bucket[5m]))",
                "错误率": "rate(milvus_query_errors_total[5m]) / rate(milvus_query_total[5m])",
                "内存增长率": "rate(milvus_memory_usage_bytes[5m])"
            }
            
            print("\nPromQL查询示例:")
            for name, query in promql_queries.items():
                print(f"  {name}: {query}")
            ---
    b.业务指标
        a.功能说明
            除系统指标外，还需监控业务相关指标。跟踪Collection数量和数据量。监控向量维度分布和数据增长趋势。统计热门查询和慢查询。分析用户行为和使用模式。监控数据质量和准确率。支持自定义业务指标。实现业务监控和分析。
        b.代码示例
            ---
            from pymilvus import connections, utility, Collection
            import time
            from datetime import datetime
            
            connections.connect(host="localhost", port="19530")
            
            # 1. Collection级别指标
            def get_collection_metrics(collection_name):
                collection = Collection(collection_name)
                collection.load()
                
                metrics = {
                    "name": collection_name,
                    "entity_count": collection.num_entities,
                    "schema": {
                        "fields": len(collection.schema.fields),
                        "description": collection.schema.description
                    },
                    "indexes": []
                }
                
                # 获取索引信息
                for field in collection.schema.fields:
                    if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                        index_info = collection.index(field.name).params
                        metrics["indexes"].append({
                            "field": field.name,
                            "type": index_info.get("index_type"),
                            "params": index_info.get("params")
                        })
                
                return metrics
            
            # 2. 数据增长监控
            def monitor_data_growth(collection_name, interval=60):
                """监控数据增长趋势"""
                collection = Collection(collection_name)
                previous_count = 0
                
                while True:
                    current_count = collection.num_entities
                    growth = current_count - previous_count
                    growth_rate = (growth / previous_count * 100) if previous_count > 0 else 0
                    
                    print(f"[{datetime.now()}] 数据量: {current_count}, "
                          f"增长: +{growth}, 增长率: {growth_rate:.2f}%")
                    
                    previous_count = current_count
                    time.sleep(interval)
            
            # 3. 查询性能统计
            class QueryMonitor:
                def __init__(self):
                    self.query_count = 0
                    self.total_latency = 0
                    self.slow_queries = []
                    self.error_count = 0
                
                def record_query(self, query, latency, success=True):
                    self.query_count += 1
                    
                    if success:
                        self.total_latency += latency
                        
                        # 记录慢查询（>100ms）
                        if latency > 100:
                            self.slow_queries.append({
                                "query": query,
                                "latency": latency,
                                "timestamp": datetime.now()
                            })
                    else:
                        self.error_count += 1
                
                def get_stats(self):
                    avg_latency = self.total_latency / self.query_count if self.query_count > 0 else 0
                    error_rate = self.error_count / self.query_count if self.query_count > 0 else 0
                    
                    return {
                        "total_queries": self.query_count,
                        "avg_latency_ms": avg_latency,
                        "slow_queries": len(self.slow_queries),
                        "error_rate": error_rate * 100
                    }
            
            # 4. 使用监控器
            monitor = QueryMonitor()
            collection = Collection("test_collection")
            
            # 模拟查询
            import numpy as np
            
            for i in range(100):
                query_vector = [[np.random.random() for _ in range(128)]]
                
                start = time.time()
                try:
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                    latency = (time.time() - start) * 1000
                    monitor.record_query(f"query_{i}", latency, success=True)
                except Exception as e:
                    monitor.record_query(f"query_{i}", 0, success=False)
            
            # 5. 输出统计
            stats = monitor.get_stats()
            print("\n查询性能统计:")
            print(f"  总查询数: {stats['total_queries']}")
            print(f"  平均延迟: {stats['avg_latency_ms']:.2f}ms")
            print(f"  慢查询数: {stats['slow_queries']}")
            print(f"  错误率: {stats['error_rate']:.2f}%")
            
            # 6. 导出指标到Prometheus
            from prometheus_client import start_http_server, Gauge, Counter
            
            # 定义指标
            query_latency = Gauge('milvus_custom_query_latency_ms', 'Query latency in milliseconds')
            query_count = Counter('milvus_custom_query_total', 'Total number of queries')
            slow_query_count = Counter('milvus_custom_slow_query_total', 'Total number of slow queries')
            
            # 启动HTTP服务器
            # start_http_server(8000)
            
            # 更新指标
            # query_latency.set(stats['avg_latency_ms'])
            # query_count.inc(stats['total_queries'])
            # slow_query_count.inc(stats['slow_queries'])
            ---

02.告警配置
    a.告警规则
        a.功能说明
            配置告警规则，及时发现系统问题。基于Prometheus Alertmanager实现告警。设置阈值，触发告警通知。支持多种告警渠道：邮件、钉钉、Slack等。配置告警级别和优先级。实现告警聚合和抑制。定期检查告警规则有效性。
        b.代码示例
            ---
            # Prometheus告警规则配置
            
            # alert_rules.yml
            alert_rules = """
            groups:
              - name: milvus_alerts
                interval: 30s
                rules:
                  # 高QPS告警
                  - alert: HighQueryRate
                    expr: rate(milvus_query_total[5m]) > 10000
                    for: 5m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus查询QPS过高"
                      description: "当前QPS: {{ $value }}, 超过阈值10000"
                  
                  # 高延迟告警
                  - alert: HighQueryLatency
                    expr: histogram_quantile(0.99, rate(milvus_query_latency_ms_bucket[5m])) > 100
                    for: 5m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus查询延迟过高"
                      description: "P99延迟: {{ $value }}ms, 超过阈值100ms"
                  
                  # 错误率告警
                  - alert: HighErrorRate
                    expr: rate(milvus_query_errors_total[5m]) / rate(milvus_query_total[5m]) > 0.05
                    for: 5m
                    labels:
                      severity: critical
                    annotations:
                      summary: "Milvus错误率过高"
                      description: "错误率: {{ $value | humanizePercentage }}, 超过阈值5%"
                  
                  # 内存使用告警
                  - alert: HighMemoryUsage
                    expr: milvus_memory_usage_bytes / milvus_memory_limit_bytes > 0.9
                    for: 5m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus内存使用率过高"
                      description: "内存使用率: {{ $value | humanizePercentage }}, 超过阈值90%"
                  
                  # 磁盘使用告警
                  - alert: HighDiskUsage
                    expr: milvus_disk_usage_bytes / milvus_disk_limit_bytes > 0.85
                    for: 10m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus磁盘使用率过高"
                      description: "磁盘使用率: {{ $value | humanizePercentage }}, 超过阈值85%"
                  
                  # 服务不可用告警
                  - alert: MilvusDown
                    expr: up{job="milvus"} == 0
                    for: 1m
                    labels:
                      severity: critical
                    annotations:
                      summary: "Milvus服务不可用"
                      description: "Milvus实例 {{ $labels.instance }} 无法访问"
                  
                  # 索引构建缓慢告警
                  - alert: SlowIndexBuilding
                    expr: milvus_index_build_duration_ms > 300000
                    for: 10m
                    labels:
                      severity: warning
                    annotations:
                      summary: "索引构建缓慢"
                      description: "索引构建耗时: {{ $value }}ms, 超过5分钟"
            """
            
            # Alertmanager配置
            alertmanager_config = """
            global:
              resolve_timeout: 5m
              smtp_smarthost: 'smtp.example.com:587'
              smtp_from: '[email protected]'
              smtp_auth_username: 'alertmanager'
              smtp_auth_password: 'password'
            
            route:
              group_by: ['alertname', 'cluster']
              group_wait: 10s
              group_interval: 10s
              repeat_interval: 12h
              receiver: 'default'
              routes:
                - match:
                    severity: critical
                  receiver: 'critical'
                  continue: true
                - match:
                    severity: warning
                  receiver: 'warning'
            
            receivers:
              - name: 'default'
                email_configs:
                  - to: '[email protected]'
              
              - name: 'critical'
                email_configs:
                  - to: '[email protected]'
                webhook_configs:
                  - url: 'https://hooks.slack.com/services/xxx'
              
              - name: 'warning'
                email_configs:
                  - to: '[email protected]'
            
            inhibit_rules:
              - source_match:
                  severity: 'critical'
                target_match:
                  severity: 'warning'
                equal: ['alertname', 'instance']
            """
            
            print("告警规则配置示例:")
            print(alert_rules)
            print("\nAlertmanager配置示例:")
            print(alertmanager_config)
            ---
    b.告警通知
        a.功能说明
            实现多渠道告警通知。支持邮件、短信、电话、IM等方式。配置告警接收人和值班表。实现告警升级机制。支持告警确认和处理。记录告警历史和处理结果。实现告警统计和分析。优化告警策略，减少误报。
        b.代码示例
            ---
            # 自定义告警通知实现
            
            import requests
            import json
            from datetime import datetime
            
            class AlertNotifier:
                def __init__(self):
                    self.alert_history = []
                
                def send_email(self, to, subject, body):
                    """发送邮件告警"""
                    # 实际应用中使用SMTP发送
                    print(f"[邮件] 发送到: {to}")
                    print(f"  主题: {subject}")
                    print(f"  内容: {body}")
                
                def send_dingtalk(self, webhook_url, message):
                    """发送钉钉告警"""
                    data = {
                        "msgtype": "markdown",
                        "markdown": {
                            "title": "Milvus告警",
                            "text": message
                        }
                    }
                    
                    try:
                        response = requests.post(
                            webhook_url,
                            headers={"Content-Type": "application/json"},
                            data=json.dumps(data)
                        )
                        print(f"[钉钉] 发送成功: {response.status_code}")
                    except Exception as e:
                        print(f"[钉钉] 发送失败: {e}")
                
                def send_slack(self, webhook_url, message):
                    """发送Slack告警"""
                    data = {
                        "text": message,
                        "username": "Milvus Alert",
                        "icon_emoji": ":warning:"
                    }
                    
                    try:
                        response = requests.post(
                            webhook_url,
                            headers={"Content-Type": "application/json"},
                            data=json.dumps(data)
                        )
                        print(f"[Slack] 发送成功: {response.status_code}")
                    except Exception as e:
                        print(f"[Slack] 发送失败: {e}")
                
                def process_alert(self, alert):
                    """处理告警"""
                    alert_info = {
                        "name": alert["labels"]["alertname"],
                        "severity": alert["labels"]["severity"],
                        "summary": alert["annotations"]["summary"],
                        "description": alert["annotations"]["description"],
                        "timestamp": datetime.now()
                    }
                    
                    self.alert_history.append(alert_info)
                    
                    # 根据严重程度选择通知方式
                    if alert_info["severity"] == "critical":
                        # 紧急告警：多渠道通知
                        self.send_email(
                            to="[email protected]",
                            subject=f"[紧急] {alert_info['summary']}",
                            body=alert_info["description"]
                        )
                        self.send_dingtalk(
                            webhook_url="https://oapi.dingtalk.com/robot/send?access_token=xxx",
                            message=f"## [紧急告警]\\n\\n**{alert_info['summary']}**\\n\\n{alert_info['description']}"
                        )
                    elif alert_info["severity"] == "warning":
                        # 警告：邮件通知
                        self.send_email(
                            to="[email protected]",
                            subject=f"[警告] {alert_info['summary']}",
                            body=alert_info["description"]
                        )
                    
                    return alert_info
            
            # 使用告警通知器
            notifier = AlertNotifier()
            
            # 模拟告警
            sample_alert = {
                "labels": {
                    "alertname": "HighQueryLatency",
                    "severity": "warning",
                    "instance": "milvus-01"
                },
                "annotations": {
                    "summary": "Milvus查询延迟过高",
                    "description": "P99延迟: 150ms, 超过阈值100ms"
                }
            }
            
            alert_info = notifier.process_alert(sample_alert)
            print(f"\n告警已处理: {alert_info['name']}")
            
            # 告警统计
            def get_alert_stats(notifier):
                stats = {
                    "total": len(notifier.alert_history),
                    "by_severity": {},
                    "by_name": {}
                }
                
                for alert in notifier.alert_history:
                    # 按严重程度统计
                    severity = alert["severity"]
                    stats["by_severity"][severity] = stats["by_severity"].get(severity, 0) + 1
                    
                    # 按告警名称统计
                    name = alert["name"]
                    stats["by_name"][name] = stats["by_name"].get(name, 0) + 1
                
                return stats
            
            stats = get_alert_stats(notifier)
            print(f"\n告警统计:")
            print(f"  总数: {stats['total']}")
            print(f"  按严重程度: {stats['by_severity']}")
            print(f"  按名称: {stats['by_name']}")
            ---

11.2 日志管理

01.日志配置
    a.日志级别
        a.功能说明
            Milvus支持多种日志级别配置。包括debug、info、warn、error、fatal五个级别。开发环境使用debug级别，生产环境使用info或warn。通过配置文件或环境变量设置日志级别。支持动态调整日志级别，无需重启。不同组件可以配置不同日志级别。合理配置日志级别，平衡详细度和性能。
        b.代码示例
            ---
            # Milvus日志配置（milvus.yaml）
            
            log_config = """
            log:
              level: info  # debug, info, warn, error, fatal
              file:
                rootPath: /var/log/milvus
                maxSize: 300  # MB
                maxAge: 10    # days
                maxBackups: 20
              format: json  # text or json
              stdout: true
            """
            
            # 通过环境变量设置
            # export LOG_LEVEL=debug
            # export LOG_FORMAT=json
            # export LOG_FILE_MAXSIZE=500
            
            # Docker Compose配置
            docker_compose_log = """
            services:
              milvus:
                environment:
                  - LOG_LEVEL=info
                  - LOG_FORMAT=json
                  - LOG_FILE_MAXSIZE=300
                  - LOG_FILE_MAXAGE=10
                  - LOG_FILE_MAXBACKUPS=20
                volumes:
                  - /var/log/milvus:/var/log/milvus
            """
            
            # Kubernetes ConfigMap配置
            k8s_log_config = """
            apiVersion: v1
            kind: ConfigMap
            metadata:
              name: milvus-log-config
              namespace: milvus
            data:
              log.level: "info"
              log.format: "json"
              log.file.maxSize: "300"
              log.file.maxAge: "10"
              log.file.maxBackups: "20"
            """
            
            print("日志配置示例:")
            print(log_config)
            print("\nDocker Compose日志配置:")
            print(docker_compose_log)
            print("\nKubernetes日志配置:")
            print(k8s_log_config)
            
            # 日志级别说明
            log_levels = {
                "debug": "详细调试信息，包含所有操作细节",
                "info": "一般信息，记录重要操作和状态变化",
                "warn": "警告信息，可能的问题但不影响运行",
                "error": "错误信息，操作失败但服务继续运行",
                "fatal": "致命错误，服务无法继续运行"
            }
            
            print("\n日志级别说明:")
            for level, desc in log_levels.items():
                print(f"  {level}: {desc}")
            
            # 不同环境的推荐配置
            env_configs = {
                "开发环境": {
                    "level": "debug",
                    "format": "text",
                    "stdout": True
                },
                "测试环境": {
                    "level": "info",
                    "format": "json",
                    "stdout": True
                },
                "生产环境": {
                    "level": "warn",
                    "format": "json",
                    "stdout": False
                }
            }
            
            print("\n不同环境的推荐配置:")
            for env, config in env_configs.items():
                print(f"  {env}: {config}")
            ---
    b.日志轮转
        a.功能说明
            配置日志轮转，避免日志文件过大。设置单个日志文件最大大小。配置日志文件保留天数。限制日志备份文件数量。支持按时间或大小轮转。自动压缩旧日志文件。定期清理过期日志。实现日志归档和备份。
        b.代码示例
            ---
            # 日志轮转配置
            
            # 1. Milvus内置日志轮转
            milvus_log_rotation = """
            log:
              file:
                rootPath: /var/log/milvus
                maxSize: 300      # 单个文件最大300MB
                maxAge: 10        # 保留10天
                maxBackups: 20    # 最多20个备份文件
            """
            
            # 2. 使用logrotate（Linux）
            logrotate_config = """
            # /etc/logrotate.d/milvus
            
            /var/log/milvus/*.log {
                daily                    # 每天轮转
                rotate 7                 # 保留7天
                compress                 # 压缩旧日志
                delaycompress           # 延迟压缩
                missingok               # 文件不存在不报错
                notifempty              # 空文件不轮转
                create 0644 milvus milvus  # 创建新文件权限
                sharedscripts
                postrotate
                    # 重新加载Milvus日志配置
                    killall -SIGUSR1 milvus || true
                endscript
            }
            """
            
            # 3. Docker日志轮转
            docker_log_config = """
            # docker-compose.yml
            services:
              milvus:
                logging:
                  driver: "json-file"
                  options:
                    max-size: "100m"    # 单个文件最大100MB
                    max-file: "10"      # 最多10个文件
                    compress: "true"    # 压缩日志
            """
            
            # 4. Kubernetes日志轮转
            k8s_log_rotation = """
            # 使用fluentd或filebeat收集日志
            apiVersion: v1
            kind: ConfigMap
            metadata:
              name: fluentd-config
            data:
              fluent.conf: |
                <source>
                  @type tail
                  path /var/log/milvus/*.log
                  pos_file /var/log/fluentd/milvus.log.pos
                  tag milvus.*
                  <parse>
                    @type json
                  </parse>
                </source>
                
                <match milvus.**>
                  @type elasticsearch
                  host elasticsearch.logging.svc.cluster.local
                  port 9200
                  logstash_format true
                  logstash_prefix milvus
                  <buffer>
                    @type file
                    path /var/log/fluentd/buffer
                    flush_interval 10s
                  </buffer>
                </match>
            """
            
            print("日志轮转配置:")
            print("\n1. Milvus内置:")
            print(milvus_log_rotation)
            print("\n2. logrotate:")
            print(logrotate_config)
            print("\n3. Docker:")
            print(docker_log_config)
            print("\n4. Kubernetes:")
            print(k8s_log_rotation)
            
            # 5. Python脚本清理旧日志
            import os
            import time
            from datetime import datetime, timedelta
            
            def cleanup_old_logs(log_dir, days=7):
                """清理超过指定天数的日志文件"""
                cutoff_time = time.time() - (days * 86400)
                cleaned_count = 0
                cleaned_size = 0
                
                for filename in os.listdir(log_dir):
                    filepath = os.path.join(log_dir, filename)
                    
                    if os.path.isfile(filepath) and filename.endswith('.log'):
                        file_mtime = os.path.getmtime(filepath)
                        
                        if file_mtime < cutoff_time:
                            file_size = os.path.getsize(filepath)
                            os.remove(filepath)
                            cleaned_count += 1
                            cleaned_size += file_size
                            print(f"删除: {filename}")
                
                print(f"\n清理完成: 删除{cleaned_count}个文件, 释放{cleaned_size/1024/1024:.2f}MB空间")
            
            # cleanup_old_logs("/var/log/milvus", days=7)
            ---

02.日志分析
    a.日志收集
        a.功能说明
            集中收集Milvus日志，便于分析和查询。使用ELK或EFK栈收集日志。支持多种日志收集工具：Filebeat、Fluentd、Logstash。实现日志聚合和索引。支持全文搜索和过滤。可视化日志数据。实现日志告警和监控。
        b.代码示例
            ---
            # 日志收集方案
            
            # 1. 使用Filebeat收集日志到Elasticsearch
            filebeat_config = """
            # filebeat.yml
            
            filebeat.inputs:
              - type: log
                enabled: true
                paths:
                  - /var/log/milvus/*.log
                fields:
                  service: milvus
                  environment: production
                json.keys_under_root: true
                json.add_error_key: true
            
            processors:
              - add_host_metadata: ~
              - add_cloud_metadata: ~
              - add_docker_metadata: ~
              - add_kubernetes_metadata: ~
            
            output.elasticsearch:
              hosts: ["elasticsearch:9200"]
              index: "milvus-logs-%{+yyyy.MM.dd}"
              username: "elastic"
              password: "changeme"
            
            setup.kibana:
              host: "kibana:5601"
            
            setup.ilm.enabled: true
            setup.ilm.rollover_alias: "milvus-logs"
            setup.ilm.pattern: "{now/d}-000001"
            """
            
            # 2. 使用Fluentd收集日志
            fluentd_config = """
            # fluent.conf
            
            <source>
              @type tail
              path /var/log/milvus/*.log
              pos_file /var/log/fluentd/milvus.log.pos
              tag milvus.log
              <parse>
                @type json
                time_key time
                time_format %Y-%m-%dT%H:%M:%S.%NZ
              </parse>
            </source>
            
            <filter milvus.log>
              @type record_transformer
              <record>
                hostname "#{Socket.gethostname}"
                service "milvus"
                environment "production"
              </record>
            </filter>
            
            <match milvus.log>
              @type elasticsearch
              host elasticsearch
              port 9200
              logstash_format true
              logstash_prefix milvus
              <buffer>
                @type file
                path /var/log/fluentd/buffer
                flush_interval 10s
                retry_max_times 3
              </buffer>
            </match>
            """
            
            # 3. Docker Compose部署ELK
            elk_docker_compose = """
            version: '3'
            
            services:
              elasticsearch:
                image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
                environment:
                  - discovery.type=single-node
                  - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
                volumes:
                  - es_data:/usr/share/elasticsearch/data
                ports:
                  - "9200:9200"
              
              kibana:
                image: docker.elastic.co/kibana/kibana:8.5.0
                environment:
                  - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
                ports:
                  - "5601:5601"
                depends_on:
                  - elasticsearch
              
              filebeat:
                image: docker.elastic.co/beats/filebeat:8.5.0
                user: root
                volumes:
                  - ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
                  - /var/log/milvus:/var/log/milvus:ro
                  - filebeat_data:/usr/share/filebeat/data
                depends_on:
                  - elasticsearch
            
            volumes:
              es_data:
              filebeat_data:
            """
            
            print("日志收集配置:")
            print("\n1. Filebeat:")
            print(filebeat_config)
            print("\n2. Fluentd:")
            print(fluentd_config)
            print("\n3. ELK Docker Compose:")
            print(elk_docker_compose)
            
            # 4. Python查询Elasticsearch日志
            from elasticsearch import Elasticsearch
            from datetime import datetime, timedelta
            
            def query_milvus_logs(es_host="localhost:9200", hours=1):
                """查询最近N小时的Milvus日志"""
                es = Elasticsearch([es_host])
                
                # 构建查询
                query = {
                    "query": {
                        "bool": {
                            "must": [
                                {"match": {"service": "milvus"}},
                                {"range": {
                                    "@timestamp": {
                                        "gte": f"now-{hours}h",
                                        "lte": "now"
                                    }
                                }}
                            ]
                        }
                    },
                    "sort": [{"@timestamp": {"order": "desc"}}],
                    "size": 100
                }
                
                # 执行查询
                result = es.search(index="milvus-logs-*", body=query)
                
                print(f"查询到 {result['hits']['total']['value']} 条日志:\n")
                
                for hit in result['hits']['hits']:
                    log = hit['_source']
                    print(f"[{log.get('@timestamp')}] {log.get('level', 'INFO')}: {log.get('message', '')}")
                
                return result
            
            # query_milvus_logs(hours=1)
            
            # 5. 查询错误日志
            def query_error_logs(es_host="localhost:9200", hours=24):
                """查询错误日志"""
                es = Elasticsearch([es_host])
                
                query = {
                    "query": {
                        "bool": {
                            "must": [
                                {"match": {"service": "milvus"}},
                                {"terms": {"level": ["error", "fatal"]}},
                                {"range": {
                                    "@timestamp": {
                                        "gte": f"now-{hours}h"
                                    }
                                }}
                            ]
                        }
                    },
                    "aggs": {
                        "error_types": {
                            "terms": {
                                "field": "message.keyword",
                                "size": 10
                            }
                        }
                    }
                }
                
                result = es.search(index="milvus-logs-*", body=query)
                
                print(f"错误日志统计:")
                for bucket in result['aggregations']['error_types']['buckets']:
                    print(f"  {bucket['key']}: {bucket['doc_count']}次")
            
            # query_error_logs(hours=24)
            ---
    b.日志分析
        a.功能说明
            分析Milvus日志，发现问题和优化机会。统计错误类型和频率。分析慢查询和性能瓶颈。识别异常模式和趋势。生成日志报告和可视化。实现日志告警和通知。支持自定义分析规则。提供日志查询API。
        b.代码示例
            ---
            # 日志分析工具
            
            import re
            from collections import Counter
            from datetime import datetime
            
            class LogAnalyzer:
                def __init__(self, log_file):
                    self.log_file = log_file
                    self.logs = []
                    self.load_logs()
                
                def load_logs(self):
                    """加载日志文件"""
                    with open(self.log_file, 'r') as f:
                        for line in f:
                            try:
                                import json
                                log = json.loads(line)
                                self.logs.append(log)
                            except:
                                pass
                
                def count_by_level(self):
                    """按级别统计日志"""
                    levels = [log.get('level', 'UNKNOWN') for log in self.logs]
                    return Counter(levels)
                
                def find_errors(self):
                    """查找错误日志"""
                    errors = [log for log in self.logs if log.get('level') in ['error', 'fatal']]
                    return errors
                
                def find_slow_queries(self, threshold_ms=100):
                    """查找慢查询"""
                    slow_queries = []
                    
                    for log in self.logs:
                        if 'query' in log.get('message', '').lower():
                            latency = log.get('latency_ms', 0)
                            if latency > threshold_ms:
                                slow_queries.append({
                                    'time': log.get('time'),
                                    'latency': latency,
                                    'message': log.get('message')
                                })
                    
                    return sorted(slow_queries, key=lambda x: x['latency'], reverse=True)
                
                def analyze_patterns(self):
                    """分析日志模式"""
                    messages = [log.get('message', '') for log in self.logs]
                    message_counts = Counter(messages)
                    
                    # 找出最频繁的消息
                    top_messages = message_counts.most_common(10)
                    
                    return top_messages
                
                def generate_report(self):
                    """生成分析报告"""
                    report = {
                        'total_logs': len(self.logs),
                        'by_level': dict(self.count_by_level()),
                        'error_count': len(self.find_errors()),
                        'slow_query_count': len(self.find_slow_queries()),
                        'top_messages': self.analyze_patterns()
                    }
                    
                    return report
            
            # 使用日志分析器
            # analyzer = LogAnalyzer('/var/log/milvus/milvus.log')
            # report = analyzer.generate_report()
            
            # print("日志分析报告:")
            # print(f"  总日志数: {report['total_logs']}")
            # print(f"  按级别: {report['by_level']}")
            # print(f"  错误数: {report['error_count']}")
            # print(f"  慢查询数: {report['slow_query_count']}")
            
            # Kibana查询示例
            kibana_queries = {
                "错误日志": {
                    "query": 'level:"error" OR level:"fatal"',
                    "time_range": "Last 24 hours"
                },
                "慢查询": {
                    "query": 'message:"query" AND latency_ms:>100',
                    "time_range": "Last 1 hour"
                },
                "高QPS": {
                    "query": 'message:"query"',
                    "aggregation": "count by 1 minute",
                    "threshold": "> 1000"
                },
                "内存告警": {
                    "query": 'message:"memory" AND level:"warn"',
                    "time_range": "Last 6 hours"
                }
            }
            
            print("\nKibana查询示例:")
            for name, query in kibana_queries.items():
                print(f"\n{name}:")
                for key, value in query.items():
                    print(f"  {key}: {value}")
            ---

11.3 备份恢复

01.备份策略
    a.全量备份
        a.功能说明
            定期进行全量备份，保护数据安全。备份包括向量数据、元数据、配置文件。使用Milvus Backup工具或手动备份。备份到本地磁盘或对象存储。设置备份保留策略。验证备份完整性。记录备份历史和状态。实现自动化备份流程。
        b.代码示例
            ---
            # Milvus全量备份
            
            # 1. 使用Milvus Backup工具
            backup_commands = """
            # 安装Milvus Backup
            wget https://github.com/zilliztech/milvus-backup/releases/download/v0.3.0/milvus-backup
            chmod +x milvus-backup
            
            # 配置backup.yaml
            cat > backup.yaml <<EOF
            milvus:
              address: localhost
              port: 19530
              username: ""
              password: ""
            
            minio:
              address: localhost
              port: 9000
              accessKeyID: minioadmin
              secretAccessKey: minioadmin
              useSSL: false
              bucketName: milvus-bucket
            
            backup:
              backupPath: /backup/milvus
              maxBackupNum: 7
            EOF
            
            # 创建备份
            ./milvus-backup create -n backup_20240115
            
            # 列出备份
            ./milvus-backup list
            
            # 查看备份详情
            ./milvus-backup get -n backup_20240115
            
            # 删除备份
            ./milvus-backup delete -n backup_20240115
            """
            
            # 2. 手动备份脚本
            backup_script = """
            #!/bin/bash
            # Milvus手动备份脚本
            
            BACKUP_DIR="/backup/milvus/$(date +%Y%m%d_%H%M%S)"
            mkdir -p $BACKUP_DIR
            
            echo "开始备份Milvus数据..."
            
            # 备份MinIO数据（向量数据）
            echo "备份MinIO数据..."
            mc mirror milvus-minio/milvus-bucket $BACKUP_DIR/minio-data
            
            # 备份etcd数据（元数据）
            echo "备份etcd数据..."
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot save /tmp/snapshot.db
            kubectl cp milvus/etcd-0:/tmp/snapshot.db $BACKUP_DIR/etcd-snapshot.db
            
            # 备份配置文件
            echo "备份配置文件..."
            kubectl get configmap -n milvus -o yaml > $BACKUP_DIR/configmaps.yaml
            kubectl get secret -n milvus -o yaml > $BACKUP_DIR/secrets.yaml
            
            # 压缩备份
            echo "压缩备份文件..."
            tar -czf $BACKUP_DIR.tar.gz -C $(dirname $BACKUP_DIR) $(basename $BACKUP_DIR)
            rm -rf $BACKUP_DIR
            
            # 上传到S3
            echo "上传到S3..."
            aws s3 cp $BACKUP_DIR.tar.gz s3://milvus-backups/
            
            # 清理本地备份（保留最近7天）
            find /backup/milvus -name "*.tar.gz" -mtime +7 -delete
            
            echo "备份完成: $BACKUP_DIR.tar.gz"
            """
            
            # 3. Python备份脚本
            import subprocess
            import os
            from datetime import datetime
            
            def backup_milvus(backup_dir="/backup/milvus"):
                """执行Milvus备份"""
                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                backup_path = os.path.join(backup_dir, f"backup_{timestamp}")
                os.makedirs(backup_path, exist_ok=True)
                
                print(f"开始备份到: {backup_path}")
                
                # 备份MinIO
                print("备份MinIO数据...")
                subprocess.run([
                    "mc", "mirror",
                    "milvus-minio/milvus-bucket",
                    f"{backup_path}/minio-data"
                ])
                
                # 备份etcd
                print("备份etcd数据...")
                subprocess.run([
                    "kubectl", "exec", "-n", "milvus", "etcd-0", "--",
                    "etcdctl", "snapshot", "save", "/tmp/snapshot.db"
                ])
                subprocess.run([
                    "kubectl", "cp",
                    "milvus/etcd-0:/tmp/snapshot.db",
                    f"{backup_path}/etcd-snapshot.db"
                ])
                
                # 压缩备份
                print("压缩备份...")
                subprocess.run([
                    "tar", "-czf", f"{backup_path}.tar.gz",
                    "-C", backup_dir,
                    f"backup_{timestamp}"
                ])
                
                # 清理临时目录
                subprocess.run(["rm", "-rf", backup_path])
                
                print(f"备份完成: {backup_path}.tar.gz")
                return f"{backup_path}.tar.gz"
            
            # backup_milvus()
            
            # 4. 定时备份（crontab）
            crontab_config = """
            # 每天凌晨2点执行备份
            0 2 * * * /opt/scripts/backup-milvus.sh >> /var/log/milvus-backup.log 2>&1
            
            # 每周日凌晨3点执行全量备份
            0 3 * * 0 /opt/scripts/backup-milvus-full.sh >> /var/log/milvus-backup.log 2>&1
            """
            
            print("备份命令:")
            print(backup_commands)
            print("\n备份脚本:")
            print(backup_script)
            print("\n定时备份配置:")
            print(crontab_config)
            ---
    b.增量备份
        a.功能说明
            增量备份只备份变化的数据，节省存储空间。基于时间戳或版本号识别变化。适合频繁更新的场景。结合全量备份使用。需要记录备份基线。恢复时需要全量+增量。实现快速备份和恢复。
        b.代码示例
            ---
            # Milvus增量备份实现
            
            from pymilvus import connections, Collection, utility
            from datetime import datetime
            import json
            
            class IncrementalBackup:
                def __init__(self, backup_dir="/backup/milvus/incremental"):
                    self.backup_dir = backup_dir
                    self.metadata_file = f"{backup_dir}/metadata.json"
                    self.load_metadata()
                
                def load_metadata(self):
                    """加载备份元数据"""
                    try:
                        with open(self.metadata_file, 'r') as f:
                            self.metadata = json.load(f)
                    except:
                        self.metadata = {
                            "last_backup_time": None,
                            "collections": {}
                        }
                
                def save_metadata(self):
                    """保存备份元数据"""
                    os.makedirs(self.backup_dir, exist_ok=True)
                    with open(self.metadata_file, 'w') as f:
                        json.dump(self.metadata, f, indent=2)
                
                def backup_collection(self, collection_name):
                    """增量备份Collection"""
                    collection = Collection(collection_name)
                    
                    # 获取上次备份时间
                    last_backup = self.metadata["collections"].get(collection_name, {}).get("last_backup_time")
                    
                    # 查询新增数据
                    if last_backup:
                        # 假设有timestamp字段
                        expr = f"timestamp > {last_backup}"
                        results = collection.query(expr=expr, output_fields=["*"])
                    else:
                        # 全量备份
                        results = collection.query(expr="", output_fields=["*"])
                    
                    if not results:
                        print(f"{collection_name}: 没有新数据")
                        return
                    
                    # 保存增量数据
                    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                    backup_file = f"{self.backup_dir}/{collection_name}_{timestamp}.json"
                    
                    with open(backup_file, 'w') as f:
                        json.dump(results, f)
                    
                    # 更新元数据
                    self.metadata["collections"][collection_name] = {
                        "last_backup_time": datetime.now().timestamp(),
                        "last_backup_file": backup_file,
                        "record_count": len(results)
                    }
                    self.save_metadata()
                    
                    print(f"{collection_name}: 备份{len(results)}条记录到 {backup_file}")
                
                def backup_all(self):
                    """增量备份所有Collection"""
                    collections = utility.list_collections()
                    
                    for coll_name in collections:
                        self.backup_collection(coll_name)
                    
                    self.metadata["last_backup_time"] = datetime.now().isoformat()
                    self.save_metadata()
            
            # 使用增量备份
            # connections.connect(host="localhost", port="19530")
            # backup = IncrementalBackup()
            # backup.backup_all()
            
            # 增量备份脚本
            incremental_backup_script = """
            #!/bin/bash
            # 增量备份脚本
            
            BACKUP_DIR="/backup/milvus/incremental"
            TIMESTAMP=$(date +%Y%m%d_%H%M%S)
            
            # 获取上次备份时间
            LAST_BACKUP=$(cat $BACKUP_DIR/last_backup_time.txt 2>/dev/null || echo "0")
            CURRENT_TIME=$(date +%s)
            
            # 备份MinIO中的新文件
            mc mirror --newer-than ${LAST_BACKUP}s milvus-minio/milvus-bucket $BACKUP_DIR/$TIMESTAMP/
            
            # 记录本次备份时间
            echo $CURRENT_TIME > $BACKUP_DIR/last_backup_time.txt
            
            # 压缩备份
            tar -czf $BACKUP_DIR/incremental_$TIMESTAMP.tar.gz -C $BACKUP_DIR $TIMESTAMP
            rm -rf $BACKUP_DIR/$TIMESTAMP
            
            echo "增量备份完成: incremental_$TIMESTAMP.tar.gz"
            """
            
            print("增量备份脚本:")
            print(incremental_backup_script)
            ---

02.恢复流程
    a.数据恢复
        a.功能说明
            从备份恢复Milvus数据。支持全量恢复和增量恢复。恢复前停止Milvus服务。恢复向量数据、元数据、配置。验证恢复后的数据完整性。测试服务可用性。记录恢复过程和结果。制定恢复预案和演练。
        b.代码示例
            ---
            # Milvus数据恢复
            
            # 1. 使用Milvus Backup恢复
            restore_commands = """
            # 列出可用备份
            ./milvus-backup list
            
            # 恢复指定备份
            ./milvus-backup restore -n backup_20240115
            
            # 恢复到指定Collection
            ./milvus-backup restore -n backup_20240115 -c collection_name
            
            # 恢复并重命名Collection
            ./milvus-backup restore -n backup_20240115 -c old_name -t new_name
            """
            
            # 2. 手动恢复脚本
            restore_script = """
            #!/bin/bash
            # Milvus手动恢复脚本
            
            BACKUP_FILE=$1
            
            if [ -z "$BACKUP_FILE" ]; then
                echo "用法: $0 <backup_file.tar.gz>"
                exit 1
            fi
            
            echo "开始恢复Milvus数据..."
            
            # 停止Milvus服务
            echo "停止Milvus服务..."
            kubectl scale deployment milvus-standalone --replicas=0 -n milvus
            sleep 10
            
            # 解压备份
            echo "解压备份文件..."
            RESTORE_DIR="/tmp/milvus_restore"
            mkdir -p $RESTORE_DIR
            tar -xzf $BACKUP_FILE -C $RESTORE_DIR
            
            # 恢复etcd数据
            echo "恢复etcd数据..."
            kubectl cp $RESTORE_DIR/etcd-snapshot.db milvus/etcd-0:/tmp/snapshot.db
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot restore /tmp/snapshot.db \\
                --data-dir=/var/lib/etcd-restore
            
            # 恢复MinIO数据
            echo "恢复MinIO数据..."
            mc mirror $RESTORE_DIR/minio-data milvus-minio/milvus-bucket
            
            # 恢复配置
            echo"恢复配置..."
            kubectl apply -f $RESTORE_DIR/configmaps.yaml
            kubectl apply -f $RESTORE_DIR/secrets.yaml
            
            # 启动Milvus服务
            echo "启动Milvus服务..."
            kubectl scale deployment milvus-standalone --replicas=1 -n milvus
            
            # 等待服务就绪
            echo "等待服务就绪..."
            kubectl wait --for=condition=ready pod -l app=milvus -n milvus --timeout=300s
            
            # 清理临时文件
            rm -rf $RESTORE_DIR
            
            echo "恢复完成！"
            """
            
            # 3. Python恢复脚本
            import subprocess
            import os
            import time
            
            def restore_milvus(backup_file):
                """恢复Milvus数据"""
                print(f"开始恢复: {backup_file}")
                
                # 停止服务
                print("停止Milvus服务...")
                subprocess.run([
                    "kubectl", "scale", "deployment", "milvus-standalone",
                    "--replicas=0", "-n", "milvus"
                ])
                time.sleep(10)
                
                # 解压备份
                print("解压备份...")
                restore_dir = "/tmp/milvus_restore"
                os.makedirs(restore_dir, exist_ok=True)
                subprocess.run([
                    "tar", "-xzf", backup_file,
                    "-C", restore_dir
                ])
                
                # 恢复数据
                print("恢复数据...")
                # ... 恢复逻辑 ...
                
                # 启动服务
                print("启动服务...")
                subprocess.run([
                    "kubectl", "scale", "deployment", "milvus-standalone",
                    "--replicas=1", "-n", "milvus"
                ])
                
                # 等待就绪
                print("等待服务就绪...")
                subprocess.run([
                    "kubectl", "wait", "--for=condition=ready",
                    "pod", "-l", "app=milvus",
                    "-n", "milvus", "--timeout=300s"
                ])
                
                print("恢复完成！")
            
            # restore_milvus("/backup/milvus/backup_20240115.tar.gz")
            
            # 4. 验证恢复
            from pymilvus import connections, utility, Collection
            
            def verify_restore():
                """验证恢复后的数据"""
                connections.connect(host="localhost", port="19530")
                
                print("验证恢复结果:\n")
                
                # 检查Collections
                collections = utility.list_collections()
                print(f"Collections数量: {len(collections)}")
                
                for coll_name in collections:
                    collection = Collection(coll_name)
                    count = collection.num_entities
                    print(f"  {coll_name}: {count} entities")
                
                # 测试查询
                if collections:
                    collection = Collection(collections[0])
                    collection.load()
                    
                    import numpy as np
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                    
                    print(f"\n测试查询成功: 返回{len(results[0])}个结果")
                
                connections.disconnect("default")
            
            # verify_restore()
            
            print("恢复命令:")
            print(restore_commands)
            print("\n恢复脚本:")
            print(restore_script)
            ---
    b.灾难恢复
        a.功能说明
            制定灾难恢复计划，应对极端情况。定义RTO和RPO目标。准备备用环境和资源。定期演练恢复流程。文档化恢复步骤。建立应急响应团队。实现跨区域容灾。监控恢复进度和状态。
        b.代码示例
            ---
            # 灾难恢复计划
            
            disaster_recovery_plan = """
            # Milvus灾难恢复计划
            
            ## 1. 恢复目标
            - RTO (恢复时间目标): 2小时
            - RPO (恢复点目标): 24小时
            
            ## 2. 恢复流程
            
            ### 2.1 评估阶段（15分钟）
            - [ ] 确认灾难类型和影响范围
            - [ ] 评估数据丢失程度
            - [ ] 确定恢复策略
            - [ ] 通知相关人员
            
            ### 2.2 准备阶段（30分钟）
            - [ ] 准备备用环境
            - [ ] 下载最新备份
            - [ ] 验证备份完整性
            - [ ] 准备恢复工具
            
            ### 2.3 恢复阶段（60分钟）
            - [ ] 部署Milvus集群
            - [ ] 恢复etcd数据
            - [ ] 恢复MinIO数据
            - [ ] 恢复配置文件
            - [ ] 启动服务
            
            ### 2.4 验证阶段（15分钟）
            - [ ] 验证数据完整性
            - [ ] 测试查询功能
            - [ ] 测试写入功能
            - [ ] 性能测试
            
            ## 3. 联系人
            - 技术负责人: xxx (电话: xxx)
            - 运维负责人: xxx (电话: xxx)
            - 业务负责人: xxx (电话: xxx)
            
            ## 4. 备用资源
            - 备用集群: xxx
            - 备份存储: s3://milvus-backups/
            - 监控地址: https://monitoring.example.com
            """
            
            # 灾难恢复脚本
            dr_script = """
            #!/bin/bash
            # 灾难恢复自动化脚本
            
            set -e
            
            echo "=========================================="
            echo "Milvus灾难恢复脚本"
            echo "=========================================="
            
            # 1. 评估阶段
            echo "1. 评估灾难影响..."
            BACKUP_LOCATION="s3://milvus-backups/"
            LATEST_BACKUP=$(aws s3 ls $BACKUP_LOCATION | sort | tail -n 1 | awk '{print $4}')
            
            echo "最新备份: $LATEST_BACKUP"
            
            # 2. 准备阶段
            echo "2. 准备恢复环境..."
            
            # 创建新的Kubernetes命名空间
            kubectl create namespace milvus-dr
            
            # 部署依赖服务
            helm install etcd bitnami/etcd -n milvus-dr
            helm install minio bitnami/minio -n milvus-dr
            helm install pulsar apache/pulsar -n milvus-dr
            
            # 3. 恢复阶段
            echo "3. 恢复数据..."
            
            # 下载备份
            aws s3 cp $BACKUP_LOCATION$LATEST_BACKUP /tmp/backup.tar.gz
            
            # 解压备份
            tar -xzf /tmp/backup.tar.gz -C /tmp/
            
            # 恢复数据
            # ... 恢复逻辑 ...
            
            # 部署Milvus
            helm install milvus-dr milvus/milvus -n milvus-dr
            
            # 4. 验证阶段
            echo "4. 验证恢复结果..."
            
            # 等待服务就绪
            kubectl wait --for=condition=ready pod -l app=milvus -n milvus-dr --timeout=300s
            
            # 运行验证脚本
            python3 verify_restore.py
            
            echo "=========================================="
            echo "灾难恢复完成！"
            echo "=========================================="
            """
            
            print("灾难恢复计划:")
            print(disaster_recovery_plan)
            print("\n灾难恢复脚本:")
            print(dr_script)
            ---

11.4 故障处理

01.常见故障
    a.连接失败
        a.功能说明
            连接失败是最常见的问题之一。可能原因包括网络问题、服务未启动、端口配置错误、防火墙阻止等。检查Milvus服务状态和网络连通性。验证连接参数配置。查看防火墙和安全组设置。检查DNS解析。使用telnet或curl测试连接。查看Milvus日志获取详细错误信息。
        b.代码示例
            ---
            # 连接失败故障排查
            
            from pymilvus import connections
            import socket
            import subprocess
            
            def diagnose_connection(host="localhost", port="19530"):
                """诊断连接问题"""
                print(f"诊断Milvus连接: {host}:{port}\n")
                
                # 1. 检查网络连通性
                print("1. 检查网络连通性...")
                try:
                    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                    sock.settimeout(5)
                    result = sock.connect_ex((host, int(port)))
                    sock.close()
                    
                    if result == 0:
                        print("   ✓ 端口可达")
                    else:
                        print(f"   ✗ 端口不可达 (错误码: {result})")
                        return
                except Exception as e:
                    print(f"   ✗ 网络错误: {e}")
                    return
                
                # 2. 检查DNS解析
                print("\n2. 检查DNS解析...")
                try:
                    ip = socket.gethostbyname(host)
                    print(f"   ✓ DNS解析成功: {host} -> {ip}")
                except Exception as e:
                    print(f"   ✗ DNS解析失败: {e}")
                
                # 3. 测试Milvus连接
                print("\n3. 测试Milvus连接...")
                try:
                    connections.connect(
                        alias="test",
                        host=host,
                        port=port,
                        timeout=10
                    )
                    print("   ✓ Milvus连接成功")
                    connections.disconnect("test")
                except Exception as e:
                    print(f"   ✗ Milvus连接失败: {e}")
                    
                    # 4. 检查服务状态
                    print("\n4. 检查服务状态...")
                    try:
                        result = subprocess.run(
                            ["kubectl", "get", "pods", "-n", "milvus"],
                            capture_output=True,
                            text=True
                        )
                        print(result.stdout)
                    except:
                        print("   无法检查Kubernetes状态")
                
                # 5. 检查防火墙
                print("\n5. 防火墙检查建议:")
                print("   - 检查iptables规则: sudo iptables -L")
                print("   - 检查firewalld: sudo firewall-cmd --list-all")
                print("   - 检查云安全组配置")
                
                # 6. 检查日志
                print("\n6. 查看日志:")
                print(f"   kubectl logs -n milvus <pod-name>")
                print(f"   或: docker logs milvus-standalone")
            
            # diagnose_connection("localhost", "19530")
            
            # 常见连接错误及解决方案
            connection_errors = {
                "connection refused": {
                    "原因": "服务未启动或端口未监听",
                    "解决方案": [
                        "检查Milvus服务状态",
                        "验证端口配置",
                        "查看服务日志"
                    ]
                },
                "timeout": {
                    "原因": "网络不通或服务响应慢",
                    "解决方案": [
                        "检查网络连通性",
                        "增加超时时间",
                        "检查服务负载"
                    ]
                },
                "authentication failed": {
                    "原因": "用户名或密码错误",
                    "解决方案": [
                        "验证认证信息",
                        "检查用户权限",
                        "重置密码"
                    ]
                },
                "DNS resolution failed": {
                    "原因": "域名无法解析",
                    "解决方案": [
                        "检查DNS配置",
                        "使用IP地址连接",
                        "检查hosts文件"
                    ]
                }
            }
            
            print("\n常见连接错误及解决方案:")
            for error, info in connection_errors.items():
                print(f"\n{error}:")
                print(f"  原因: {info['原因']}")
                print(f"  解决方案:")
                for solution in info['解决方案']:
                    print(f"    - {solution}")
            ---
    b.查询超时
        a.功能说明
            查询超时通常由性能问题引起。可能原因包括数据量过大、索引未优化、资源不足、并发过高等。检查查询参数配置。优化索引类型和参数。增加Query Node资源。调整超时时间。分析慢查询日志。实现查询限流。优化数据模型。
        b.代码示例
            ---
            # 查询超时故障排查
            
            from pymilvus import connections, Collection
            import time
            import numpy as np
            
            def diagnose_query_timeout(collection_name):
                """诊断查询超时问题"""
                connections.connect(host="localhost", port="19530")
                collection = Collection(collection_name)
                collection.load()
                
                print(f"诊断Collection: {collection_name}\n")
                
                # 1. 检查Collection信息
                print("1. Collection信息:")
                print(f"   数据量: {collection.num_entities}")
                print(f"   字段数: {len(collection.schema.fields)}")
                
                # 2. 检查索引
                print("\n2. 索引信息:")
                for field in collection.schema.fields:
                    if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                        index = collection.index(field.name)
                        print(f"   {field.name}:")
                        print(f"     类型: {index.params.get('index_type')}")
                        print(f"     参数: {index.params.get('params')}")
                
                # 3. 测试查询性能
                print("\n3. 查询性能测试:")
                
                test_cases = [
                    {"nprobe": 8, "limit": 10},
                    {"nprobe": 16, "limit": 10},
                    {"nprobe": 32, "limit": 10},
                    {"nprobe": 16, "limit": 100}
                ]
                
                query_vector = [[np.random.random() for _ in range(128)]]
                
                for params in test_cases:
                    start = time.time()
                    try:
                        results = collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": params},
                            limit=params["limit"],
                            timeout=30
                        )
                        latency = (time.time() - start) * 1000
                        print(f"   nprobe={params['nprobe']}, limit={params['limit']}: {latency:.2f}ms")
                    except Exception as e:
                        print(f"   nprobe={params['nprobe']}, limit={params['limit']}: 超时或失败 ({e})")
                
                # 4. 资源使用情况
                print("\n4. 资源使用建议:")
                print("   - 检查Query Node CPU/内存使用")
                print("   - 检查是否需要增加Query Node数量")
                print("   - 检查索引是否已加载到内存")
                
                # 5. 优化建议
                print("\n5. 优化建议:")
                
                if collection.num_entities > 10000000:
                    print("   - 数据量较大，考虑分片或分区")
                
                print("   - 优化索引参数（降低nprobe）")
                print("   - 增加Query Node资源")
                print("   - 使用更高效的索引类型（如HNSW）")
                print("   - 实现查询缓存")
                
                connections.disconnect("default")
            
            # diagnose_query_timeout("test_collection")
            
            # 查询超时优化方案
            optimization_strategies = {
                "索引优化": {
                    "FLAT -> IVF_FLAT": "适合中等规模数据",
                    "IVF_FLAT -> IVF_PQ": "牺牲精度换取速度",
                    "IVF -> HNSW": "更好的查询性能"
                },
                "参数调优": {
                    "降低nprobe": "减少搜索的聚类中心数量",
                    "降低limit": "减少返回结果数量",
                    "增加timeout": "给予更多查询时间"
                },
                "资源扩展": {
                    "增加Query Node": "提升并发查询能力",
                    "增加内存": "缓存更多索引数据",
                    "使用SSD": "加快数据加载速度"
                },
                "架构优化": {
                    "数据分区": "按业务逻辑分区数据",
                    "查询缓存": "缓存热门查询结果",
                    "异步查询": "使用异步API"
                }
            }
            
            print("\n查询超时优化方案:")
            for category, strategies in optimization_strategies.items():
                print(f"\n{category}:")
                for strategy, desc in strategies.items():
                    print(f"  {strategy}: {desc}")
            ---

02.性能问题
    a.性能分析
        a.功能说明
            系统性能下降需要全面分析。监控QPS、延迟、资源使用等指标。分析慢查询和热点数据。检查索引效率和数据分布。评估硬件资源是否充足。识别性能瓶颈所在。制定优化方案。实施性能测试验证效果。
        b.代码示例
            ---
            # 性能分析工具
            
            from pymilvus import connections, Collection, utility
            import time
            import numpy as np
            from collections import defaultdict
            
            class PerformanceAnalyzer:
                def __init__(self, host="localhost", port="19530"):
                    connections.connect(host=host, port=port)
                    self.metrics = defaultdict(list)
                
                def analyze_collection(self, collection_name):
                    """分析Collection性能"""
                    collection = Collection(collection_name)
                    collection.load()
                    
                    print(f"性能分析: {collection_name}\n")
                    
                    # 1. 基本信息
                    print("1. 基本信息:")
                    print(f"   数据量: {collection.num_entities:,}")
                    print(f"   字段数: {len(collection.schema.fields)}")
                    
                    # 2. 索引分析
                    print("\n2. 索引分析:")
                    for field in collection.schema.fields:
                        if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                            index = collection.index(field.name)
                            print(f"   {field.name}:")
                            print(f"     类型: {index.params.get('index_type')}")
                            print(f"     参数: {index.params.get('params')}")
                    
                    # 3. 查询性能测试
                    print("\n3. 查询性能测试:")
                    
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    # 测试不同参数组合
                    test_params = [
                        {"nprobe": 8, "limit": 10},
                        {"nprobe": 16, "limit": 10},
                        {"nprobe": 32, "limit": 10},
                    ]
                    
                    for params in test_params:
                        latencies = []
                        
                        # 多次测试取平均
                        for _ in range(10):
                            start = time.time()
                            collection.search(
                                data=query_vector,
                                anns_field="embedding",
                                param={"metric_type": "L2", "params": params},
                                limit=params["limit"]
                            )
                            latency = (time.time() - start) * 1000
                            latencies.append(latency)
                        
                        avg_latency = sum(latencies) / len(latencies)
                        p99_latency = sorted(latencies)[int(len(latencies) * 0.99)]
                        
                        print(f"   nprobe={params['nprobe']}:")
                        print(f"     平均延迟: {avg_latency:.2f}ms")
                        print(f"     P99延迟: {p99_latency:.2f}ms")
                        
                        self.metrics[f"nprobe_{params['nprobe']}"] = {
                            "avg": avg_latency,
                            "p99": p99_latency
                        }
                    
                    # 4. 并发性能测试
                    print("\n4. 并发性能测试:")
                    self.test_concurrent_queries(collection, threads=10)
                    
                    # 5. 性能评分
                    print("\n5. 性能评分:")
                    score = self.calculate_performance_score()
                    print(f"   总分: {score}/100")
                    
                    # 6. 优化建议
                    print("\n6. 优化建议:")
                    self.generate_recommendations(collection)
                
                def test_concurrent_queries(self, collection, threads=10):
                    """测试并发查询性能"""
                    import threading
                    
                    query_vector = [[np.random.random() for _ in range(128)]]
                    results = []
                    
                    def query_worker():
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": 16}},
                            limit=10
                        )
                        latency = (time.time() - start) * 1000
                        results.append(latency)
                    
                    # 启动并发查询
                    thread_list = []
                    start = time.time()
                    
                    for _ in range(threads):
                        t = threading.Thread(target=query_worker)
                        t.start()
                        thread_list.append(t)
                    
                    for t in thread_list:
                        t.join()
                    
                    total_time = (time.time() - start) * 1000
                    avg_latency = sum(results) / len(results)
                    
                    print(f"   并发数: {threads}")
                    print(f"   总耗时: {total_time:.2f}ms")
                    print(f"   平均延迟: {avg_latency:.2f}ms")
                    print(f"   QPS: {threads / (total_time / 1000):.2f}")
                
                def calculate_performance_score(self):
                    """计算性能评分"""
                    score = 100
                    
                    # 根据延迟扣分
                    avg_latency = self.metrics.get("nprobe_16", {}).get("avg", 0)
                    if avg_latency > 100:
                        score -= 20
                    elif avg_latency > 50:
                        score -= 10
                    
                    # 根据P99延迟扣分
                    p99_latency = self.metrics.get("nprobe_16", {}).get("p99", 0)
                    if p99_latency > 200:
                        score -= 20
                    elif p99_latency > 100:
                        score -= 10
                    
                    return max(score, 0)
                
                def generate_recommendations(self, collection):
                    """生成优化建议"""
                    recommendations = []
                    
                    # 检查数据量
                    if collection.num_entities > 10000000:
                        recommendations.append("数据量较大，建议使用分区")
                    
                    # 检查延迟
                    avg_latency = self.metrics.get("nprobe_16", {}).get("avg", 0)
                    if avg_latency > 100:
                        recommendations.append("查询延迟较高，建议优化索引或增加资源")
                    
                    # 检查索引
                    for field in collection.schema.fields:
                        if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                            index = collection.index(field.name)
                            index_type = index.params.get('index_type')
                            
                            if index_type == 'FLAT' and collection.num_entities > 100000:
                                recommendations.append(f"字段{field.name}使用FLAT索引，建议切换到IVF或HNSW")
                    
                    if not recommendations:
                        recommendations.append("性能良好，无需优化")
                    
                    for i, rec in enumerate(recommendations, 1):
                        print(f"   {i}. {rec}")
            
            # 使用性能分析器
            # analyzer = PerformanceAnalyzer()
            # analyzer.analyze_collection("test_collection")
            ---
    b.性能优化
        a.功能说明
            根据分析结果实施性能优化。优化索引类型和参数。调整查询参数。增加硬件资源。实现数据分区和负载均衡。优化数据模型。实现缓存机制。调整系统配置参数。验证优化效果。
        b.代码示例
            ---
            # 性能优化实施
            
            from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
            
            class PerformanceOptimizer:
                def __init__(self, host="localhost", port="19530"):
                    connections.connect(host=host, port=port)
                
                def optimize_index(self, collection_name, field_name):
                    """优化索引"""
                    collection = Collection(collection_name)
                    
                    print(f"优化索引: {collection_name}.{field_name}\n")
                    
                    # 1. 删除旧索引
                    print("1. 删除旧索引...")
                    collection.release()
                    collection.drop_index(field_name)
                    
                    # 2. 创建优化后的索引
                    print("2. 创建优化索引...")
                    
                    # 根据数据量选择索引类型
                    num_entities = collection.num_entities
                    
                    if num_entities < 100000:
                        # 小数据量使用FLAT
                        index_params = {
                            "index_type": "FLAT",
                            "metric_type": "L2"
                        }
                    elif num_entities < 1000000:
                        # 中等数据量使用IVF_FLAT
                        index_params = {
                            "index_type": "IVF_FLAT",
                            "metric_type": "L2",
                            "params": {"nlist": 1024}
                        }
                    else:
                        # 大数据量使用HNSW
                        index_params = {
                            "index_type": "HNSW",
                            "metric_type": "L2",
                            "params": {
                                "M": 16,
                                "efConstruction": 256
                            }
                        }
                    
                    collection.create_index(
                        field_name=field_name,
                        index_params=index_params
                    )
                    
                    print(f"   索引类型: {index_params['index_type']}")
                    print(f"   索引参数: {index_params.get('params', {})}")
                    
                    # 3. 加载索引
                    print("\n3. 加载索引...")
                    collection.load()
                    
                    print("索引优化完成！")
                
                def optimize_query_params(self, collection_name):
                    """优化查询参数"""
                    collection = Collection(collection_name)
                    collection.load()
                    
                    print(f"优化查询参数: {collection_name}\n")
                    
                    # 测试不同参数组合
                    import numpy as np
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    best_params = None
                    best_score = 0
                    
                    for nprobe in [8, 16, 32, 64]:
                        latencies = []
                        
                        for _ in range(5):
                            start = time.time()
                            results = collection.search(
                                data=query_vector,
                                anns_field="embedding",
                                param={"metric_type": "L2", "params": {"nprobe": nprobe}},
                                limit=10
                            )
                            latency = (time.time() - start) * 1000
                            latencies.append(latency)
                        
                        avg_latency = sum(latencies) / len(latencies)
                        
                        # 计算得分（延迟越低越好）
                        score = 1000 / avg_latency
                        
                        print(f"nprobe={nprobe}: 平均延迟={avg_latency:.2f}ms, 得分={score:.2f}")
                        
                        if score > best_score:
                            best_score = score
                            best_params = {"nprobe": nprobe}
                    
                    print(f"\n推荐参数: {best_params}")
                    return best_params
                
                def implement_partitioning(self, collection_name, partition_field):
                    """实现数据分区"""
                    print(f"实现数据分区: {collection_name}\n")
                    
                    collection = Collection(collection_name)
                    
                    # 创建分区
                    partitions = ["partition_2023", "partition_2024", "partition_2025"]
                    
                    for partition_name in partitions:
                        if not collection.has_partition(partition_name):
                            collection.create_partition(partition_name)
                            print(f"创建分区: {partition_name}")
                    
                    print("\n分区创建完成！")
                    print("使用方法:")
                    print("  # 插入到指定分区")
                    print("  collection.insert(data, partition_name='partition_2024')")
                    print("  # 查询指定分区")
                    print("  collection.search(data, partition_names=['partition_2024'])")
            
            # 使用优化器
            # optimizer = PerformanceOptimizer()
            # optimizer.optimize_index("test_collection", "embedding")
            # optimizer.optimize_query_params("test_collection")
            # optimizer.implement_partitioning("test_collection", "year")
            
            print("性能优化工具使用示例已生成")
            ---

12 最佳实践

12.1 数据建模

01.Schema设计
    a.字段规划
        a.功能说明
            合理的Schema设计是高效使用Milvus的基础。规划字段类型和数量，避免冗余。向量字段选择合适的维度。标量字段用于过滤和元数据存储。主键字段必须唯一。考虑查询模式设计Schema。预留扩展空间。遵循最小化原则。
        b.代码示例
            ---
            from pymilvus import FieldSchema, CollectionSchema, DataType, Collection
            
            # 1. 基础Schema设计
            def create_basic_schema():
                """创建基础Schema"""
                fields = [
                    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
                    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
                    FieldSchema(name="timestamp", dtype=DataType.INT64),
                    FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100)
                ]
                
                schema = CollectionSchema(
                    fields=fields,
                    description="基础文档检索Schema"
                )
                
                return schema
            
            # 2. 多向量Schema设计
            def create_multimodal_schema():
                """创建多模态Schema"""
                fields = [
                    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                    # 文本嵌入
                    FieldSchema(name="text_embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                    # 图像嵌入
                    FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=512),
                    # 元数据
                    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=500),
                    FieldSchema(name="url", dtype=DataType.VARCHAR, max_length=1000),
                    FieldSchema(name="tags", dtype=DataType.VARCHAR, max_length=500),
                    FieldSchema(name="created_at", dtype=DataType.INT64)
                ]
                
                schema = CollectionSchema(
                    fields=fields,
                    description="多模态检索Schema"
                )
                
                return schema
            
            # 3. 电商推荐Schema
            def create_ecommerce_schema():
                """创建电商推荐Schema"""
                fields = [
                    FieldSchema(name="product_id", dtype=DataType.INT64, is_primary=True),
                    FieldSchema(name="product_embedding", dtype=DataType.FLOAT_VECTOR, dim=256),
                    FieldSchema(name="product_name", dtype=DataType.VARCHAR, max_length=200),
                    FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100),
                    FieldSchema(name="price", dtype=DataType.FLOAT),
                    FieldSchema(name="rating", dtype=DataType.FLOAT),
                    FieldSchema(name="stock", dtype=DataType.INT64),
                    FieldSchema(name="brand", dtype=DataType.VARCHAR, max_length=100),
                    FieldSchema(name="is_active", dtype=DataType.BOOL)
                ]
                
                schema = CollectionSchema(
                    fields=fields,
                    description="电商商品推荐Schema"
                )
                
                return schema
            
            # 4. Schema设计最佳实践
            schema_best_practices = {
                "字段数量": "保持在20个以内，避免过多字段影响性能",
                "向量维度": "根据模型选择，常见768/512/256/128",
                "VARCHAR长度": "根据实际需求设置，不要过大",
                "主键设计": "使用auto_id或业务ID，确保唯一性",
                "索引字段": "常用于过滤的字段建立标量索引",
                "数据类型": "选择合适的数据类型，节省存储空间"
            }
            
            print("Schema设计最佳实践:")
            for key, value in schema_best_practices.items():
                print(f"  {key}: {value}")
            
            # 5. Schema验证
            def validate_schema(schema):
                """验证Schema设计"""
                issues = []
                
                # 检查主键
                primary_fields = [f for f in schema.fields if f.is_primary]
                if len(primary_fields) == 0:
                    issues.append("缺少主键字段")
                elif len(primary_fields) > 1:
                    issues.append("存在多个主键字段")
                
                # 检查向量字段
                vector_fields = [f for f in schema.fields if f.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]]
                if len(vector_fields) == 0:
                    issues.append("缺少向量字段")
                
                # 检查字段数量
                if len(schema.fields) > 20:
                    issues.append(f"字段数量过多({len(schema.fields)})，建议少于20个")
                
                # 检查VARCHAR长度
                for field in schema.fields:
                    if field.dtype == DataType.VARCHAR:
                        if field.params.get("max_length", 0) > 65535:
                            issues.append(f"字段{field.name}的max_length过大")
                
                if issues:
                    print("Schema验证失败:")
                    for issue in issues:
                        print(f"  - {issue}")
                    return False
                else:
                    print("Schema验证通过")
                    return True
            
            # 测试Schema
            schema = create_basic_schema()
            validate_schema(schema)
            ---
    b.分区策略
        a.功能说明
            合理使用分区提升查询性能。按时间、类别、地域等维度分区。每个分区独立管理和查询。分区数量建议在4096以内。避免过多小分区。支持动态创建和删除分区。查询时指定分区减少扫描范围。实现数据生命周期管理。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            from datetime import datetime
            
            connections.connect(host="localhost", port="19530")
            
            # 1. 按时间分区
            def create_time_based_partitions(collection_name):
                """按时间创建分区"""
                collection = Collection(collection_name)
                
                # 按年份分区
                years = ["2023", "2024", "2025"]
                for year in years:
                    partition_name = f"year_{year}"
                    if not collection.has_partition(partition_name):
                        collection.create_partition(partition_name)
                        print(f"创建分区: {partition_name}")
                
                # 按月份分区（更细粒度）
                months = ["202401", "202402", "202403"]
                for month in months:
                    partition_name = f"month_{month}"
                    if not collection.has_partition(partition_name):
                        collection.create_partition(partition_name)
                        print(f"创建分区: {partition_name}")
            
            # 2. 按类别分区
            def create_category_partitions(collection_name, categories):
                """按类别创建分区"""
                collection = Collection(collection_name)
                
                for category in categories:
                    partition_name = f"cat_{category}"
                    if not collection.has_partition(partition_name):
                        collection.create_partition(partition_name)
                        print(f"创建分区: {partition_name}")
            
            # 使用示例
            # create_category_partitions("products", ["electronics", "clothing", "books"])
            
            # 3. 分区数据插入
            def insert_with_partition(collection, data, partition_key_field, partition_mapping):
                """根据字段值插入到对应分区"""
                # 按分区键分组数据
                partition_data = {}
                
                for i, value in enumerate(data[partition_key_field]):
                    partition_name = partition_mapping.get(value, "_default")
                    
                    if partition_name not in partition_data:
                        partition_data[partition_name] = {field: [] for field in data.keys()}
                    
                    for field, values in data.items():
                        partition_data[partition_name][field].append(values[i])
                
                # 插入到各分区
                for partition_name, pdata in partition_data.items():
                    collection.insert(pdata, partition_name=partition_name)
                    print(f"插入{len(pdata[partition_key_field])}条数据到分区: {partition_name}")
            
            # 4. 分区查询
            def search_with_partitions(collection, query_vector, partition_names=None):
                """在指定分区中查询"""
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10,
                    partition_names=partition_names  # 指定分区
                )
                
                return results
            
            # 查询示例
            # import numpy as np
            # query_vec = [np.random.random() for _ in range(128)]
            # results = search_with_partitions(collection, query_vec, partition_names=["year_2024"])
            
            # 5. 分区管理
            class PartitionManager:
                def __init__(self, collection):
                    self.collection = collection
                
                def list_partitions(self):
                    """列出所有分区"""
                    partitions = self.collection.partitions
                    print(f"分区数量: {len(partitions)}")
                    
                    for partition in partitions:
                        print(f"  {partition.name}: {partition.num_entities} entities")
                
                def drop_old_partitions(self, keep_count=12):
                    """删除旧分区，保留最近N个"""
                    partitions = sorted(
                        [p for p in self.collection.partitions if p.name != "_default"],
                        key=lambda p: p.name
                    )
                    
                    if len(partitions) > keep_count:
                        to_drop = partitions[:-keep_count]
                        for partition in to_drop:
                            self.collection.drop_partition(partition.name)
                            print(f"删除分区: {partition.name}")
                
                def merge_partitions(self, source_partitions, target_partition):
                    """合并多个分区"""
                    # 从源分区查询所有数据
                    all_data = []
                    for partition_name in source_partitions:
                        data = self.collection.query(
                            expr="",
                            partition_names=[partition_name],
                            output_fields=["*"]
                        )
                        all_data.extend(data)
                    
                    # 插入到目标分区
                    if not self.collection.has_partition(target_partition):
                        self.collection.create_partition(target_partition)
                    
                    # 转换数据格式
                    insert_data = {}
                    for field in self.collection.schema.fields:
                        insert_data[field.name] = [item[field.name] for item in all_data]
                    
                    self.collection.insert(insert_data, partition_name=target_partition)
                    
                    # 删除源分区
                    for partition_name in source_partitions:
                        self.collection.drop_partition(partition_name)
                    
                    print(f"合并{len(source_partitions)}个分区到: {target_partition}")
            
            # 使用分区管理器
            # collection = Collection("test_collection")
            # manager = PartitionManager(collection)
            # manager.list_partitions()
            # manager.drop_old_partitions(keep_count=12)
            
            # 6. 分区策略建议
            partition_strategies = {
                "时间分区": {
                    "适用场景": "日志、事件、时序数据",
                    "优点": "便于数据归档和清理",
                    "缺点": "可能导致热点分区",
                    "建议": "按月或季度分区，避免过细粒度"
                },
                "类别分区": {
                    "适用场景": "电商、内容分类",
                    "优点": "查询时可精确定位分区",
                    "缺点": "类别变化时需要调整",
                    "建议": "使用稳定的一级分类"
                },
                "哈希分区": {
                    "适用场景": "数据均匀分布",
                    "优点": "负载均衡",
                    "缺点": "无法按业务逻辑查询",
                    "建议": "结合其他策略使用"
                }
            }
            
            print("\n分区策略建议:")
            for strategy, info in partition_strategies.items():
                print(f"\n{strategy}:")
                for key, value in info.items():
                    print(f"  {key}: {value}")
            ---

02.数据质量
    a.数据清洗
        a.功能说明
            高质量的数据是准确检索的前提。清洗重复数据和异常值。标准化向量数据格式。验证向量维度一致性。处理缺失值和空值。过滤低质量数据。实现数据验证流程。记录数据质量指标。
        b.代码示例
            ---
            import numpy as np
            from pymilvus import Collection, connections
            
            class DataCleaner:
                def __init__(self):
                    self.stats = {
                        "total": 0,
                        "duplicates": 0,
                        "invalid_vectors": 0,
                        "missing_fields": 0,
                        "cleaned": 0
                    }
                
                def clean_vectors(self, vectors, dim=768):
                    """清洗向量数据"""
                    cleaned = []
                    
                    for vec in vectors:
                        # 检查维度
                        if len(vec) != dim:
                            self.stats["invalid_vectors"] += 1
                            continue
                        
                        # 检查NaN和Inf
                        if np.isnan(vec).any() or np.isinf(vec).any():
                            self.stats["invalid_vectors"] += 1
                            continue
                        
                        # 标准化
                        vec = np.array(vec, dtype=np.float32)
                        
                        # L2归一化
                        norm = np.linalg.norm(vec)
                        if norm > 0:
                            vec = vec / norm
                        
                        cleaned.append(vec.tolist())
                        self.stats["cleaned"] += 1
                    
                    return cleaned
                
                def remove_duplicates(self, data, id_field="id"):
                    """去除重复数据"""
                    seen_ids = set()
                    cleaned_data = {field: [] for field in data.keys()}
                    
                    for i in range(len(data[id_field])):
                        item_id = data[id_field][i]
                        
                        if item_id in seen_ids:
                            self.stats["duplicates"] += 1
                            continue
                        
                        seen_ids.add(item_id)
                        
                        for field, values in data.items():
                            cleaned_data[field].append(values[i])
                    
                    return cleaned_data
                
                def validate_data(self, data, schema):
                    """验证数据完整性"""
                    self.stats["total"] = len(data[list(data.keys())[0]])
                    
                    # 检查必填字段
                    for field in schema.fields:
                        if field.name not in data:
                            print(f"缺少字段: {field.name}")
                            return False
                        
                        # 检查数据长度一致性
                        if len(data[field.name]) != self.stats["total"]:
                            print(f"字段{field.name}数据长度不一致")
                            return False
                        
                        # 检查空值
                        if field.dtype == DataType.VARCHAR:
                            empty_count = sum(1 for v in data[field.name] if not v)
                            if empty_count > 0:
                                print(f"字段{field.name}有{empty_count}个空值")
                                self.stats["missing_fields"] += empty_count
                    
                    return True
                
                def get_stats(self):
                    """获取清洗统计"""
                    return self.stats
            
            # 使用数据清洗器
            cleaner = DataCleaner()
            
            # 示例数据
            raw_data = {
                "id": [1, 2, 2, 3, 4],  # 包含重复
                "embedding": [
                    [0.1] * 768,
                    [0.2] * 768,
                    [0.2] * 768,
                    [float('nan')] * 768,  # 包含NaN
                    [0.4] * 768
                ],
                "text": ["doc1", "doc2", "doc2", "", "doc4"]
            }
            
            # 清洗向量
            cleaned_vectors = cleaner.clean_vectors(raw_data["embedding"])
            raw_data["embedding"] = cleaned_vectors
            
            # 去重
            cleaned_data = cleaner.remove_duplicates(raw_data)
            
            # 输出统计
            stats = cleaner.get_stats()
            print("数据清洗统计:")
            print(f"  总数: {stats['total']}")
            print(f"  重复: {stats['duplicates']}")
            print(f"  无效向量: {stats['invalid_vectors']}")
            print(f"  缺失字段: {stats['missing_fields']}")
            print(f"  清洗后: {stats['cleaned']}")
            ---
    b.数据验证
        a.功能说明
            建立数据验证机制确保数据质量。验证数据格式和类型。检查向量维度和范围。验证主键唯一性。检查标量字段合法性。实现自动化验证流程。记录验证结果和异常。提供数据质量报告。
        b.代码示例
            ---
            from pymilvus import Collection, DataType
            import numpy as np
            
            class DataValidator:
                def __init__(self, schema):
                    self.schema = schema
                    self.errors = []
                
                def validate_batch(self, data):
                    """验证批量数据"""
                    self.errors = []
                    
                    # 1. 验证字段完整性
                    if not self._validate_fields(data):
                        return False
                    
                    # 2. 验证数据类型
                    if not self._validate_types(data):
                        return False
                    
                    # 3. 验证向量数据
                    if not self._validate_vectors(data):
                        return False
                    
                    # 4. 验证主键唯一性
                    if not self._validate_primary_key(data):
                        return False
                    
                    # 5. 验证VARCHAR长度
                    if not self._validate_varchar_length(data):
                        return False
                    
                    return len(self.errors) == 0
                
                def _validate_fields(self, data):
                    """验证字段完整性"""
                    for field in self.schema.fields:
                        if field.name not in data:
                            self.errors.append(f"缺少字段: {field.name}")
                            return False
                    
                    # 检查数据长度一致性
                    lengths = [len(values) for values in data.values()]
                    if len(set(lengths)) > 1:
                        self.errors.append(f"字段数据长度不一致: {lengths}")
                        return False
                    
                    return True
                
                def _validate_types(self, data):
                    """验证数据类型"""
                    for field in self.schema.fields:
                        values = data[field.name]
                        
                        if field.dtype == DataType.INT64:
                            if not all(isinstance(v, (int, np.integer)) for v in values):
                                self.errors.append(f"字段{field.name}类型错误，期望INT64")
                                return False
                        
                        elif field.dtype == DataType.FLOAT:
                            if not all(isinstance(v, (float, np.floating, int)) for v in values):
                                self.errors.append(f"字段{field.name}类型错误，期望FLOAT")
                                return False
                        
                        elif field.dtype == DataType.VARCHAR:
                            if not all(isinstance(v, str) for v in values):
                                self.errors.append(f"字段{field.name}类型错误，期望VARCHAR")
                                return False
                        
                        elif field.dtype == DataType.BOOL:
                            if not all(isinstance(v, bool) for v in values):
                                self.errors.append(f"字段{field.name}类型错误，期望BOOL")
                                return False
                    
                    return True
                
                def _validate_vectors(self, data):
                    """验证向量数据"""
                    for field in self.schema.fields:
                        if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                            vectors = data[field.name]
                            expected_dim = field.params["dim"]
                            
                            for i, vec in enumerate(vectors):
                                # 检查维度
                                if len(vec) != expected_dim:
                                    self.errors.append(
                                        f"字段{field.name}第{i}个向量维度错误: "
                                        f"期望{expected_dim}, 实际{len(vec)}"
                                    )
                                    return False
                                
                                # 检查NaN和Inf
                                vec_array = np.array(vec)
                                if np.isnan(vec_array).any():
                                    self.errors.append(f"字段{field.name}第{i}个向量包含NaN")
                                    return False
                                
                                if np.isinf(vec_array).any():
                                    self.errors.append(f"字段{field.name}第{i}个向量包含Inf")
                                    return False
                    
                    return True
                
                def _validate_primary_key(self, data):
                    """验证主键唯一性"""
                    for field in self.schema.fields:
                        if field.is_primary:
                            ids = data[field.name]
                            
                            if len(ids) != len(set(ids)):
                                duplicates = [id for id in ids if ids.count(id) > 1]
                                self.errors.append(f"主键{field.name}存在重复值: {set(duplicates)}")
                                return False
                    
                    return True
                
                def _validate_varchar_length(self, data):
                    """验证VARCHAR长度"""
                    for field in self.schema.fields:
                        if field.dtype == DataType.VARCHAR:
                            max_length = field.params.get("max_length", 65535)
                            values = data[field.name]
                            
                            for i, value in enumerate(values):
                                if len(value) > max_length:
                                    self.errors.append(
                                        f"字段{field.name}第{i}个值超长: "
                                        f"{len(value)} > {max_length}"
                                    )
                                    return False
                    
                    return True
                
                def get_errors(self):
                    """获取验证错误"""
                    return self.errors
            
            # 使用数据验证器
            from pymilvus import FieldSchema, CollectionSchema
            
            # 创建Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000)
            ]
            schema = CollectionSchema(fields=fields)
            
            # 验证数据
            validator = DataValidator(schema)
            
            test_data = {
                "id": [1, 2, 3],
                "embedding": [
                    [0.1] * 128,
                    [0.2] * 128,
                    [0.3] * 128
                ],
                "text": ["doc1", "doc2", "doc3"]
            }
            
            if validator.validate_batch(test_data):
                print("数据验证通过")
            else:
                print("数据验证失败:")
                for error in validator.get_errors():
                    print(f"  - {error}")
            ---

12.2 索引选择

01.索引类型
    a.FLAT索引
        a.功能说明
            FLAT索引是最简单的索引类型，不进行任何压缩或近似。适合小规模数据集（<10万向量）。提供100%召回率，结果最准确。查询速度随数据量线性增长。不需要训练过程，创建速度快。内存占用等于原始向量大小。适合对准确性要求极高的场景。作为其他索引的基准对比。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            
            connections.connect(host="localhost", port="19530")
            collection = Collection("test_collection")
            
            # 创建FLAT索引
            index_params = {
                "index_type": "FLAT",
                "metric_type": "L2"
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            
            print("FLAT索引特点:")
            print("  适用场景: 小规模数据(<10万)")
            print("  召回率: 100%")
            print("  查询速度: 慢(线性扫描)")
            print("  内存占用: 高(等于原始数据)")
            print("  构建时间: 快(无需训练)")
            ---
    b.IVF索引
        a.功能说明
            IVF(Inverted File)索引通过聚类加速检索。将向量空间划分为nlist个聚类中心。查询时只搜索nprobe个最近的聚类。适合中大规模数据集（10万-1000万）。需要训练过程确定聚类中心。支持IVF_FLAT、IVF_SQ8、IVF_PQ等变体。平衡准确性和性能。是最常用的索引类型。
        b.代码示例
            ---
            # IVF_FLAT索引
            ivf_flat_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=ivf_flat_params
            )
            
            # 查询参数
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            # IVF_SQ8索引(标量量化)
            ivf_sq8_params = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            # IVF_PQ索引(乘积量化)
            ivf_pq_params = {
                "index_type": "IVF_PQ",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024,
                    "m": 8,
                    "nbits": 8
                }
            }
            
            print("IVF索引对比:")
            print("  IVF_FLAT: 准确度高，内存占用大")
            print("  IVF_SQ8: 内存占用减少75%，准确度略降")
            print("  IVF_PQ: 内存占用最小，准确度进一步降低")
            ---

02.参数调优
    a.nlist参数
        a.功能说明
            nlist是IVF索引的聚类中心数量。影响索引构建时间和查询性能。nlist越大，聚类越细，查询越快但构建越慢。推荐值：sqrt(N)到4*sqrt(N)，N为向量数量。常见取值：128、256、512、1024、2048。需要根据数据规模和查询需求调整。过大会增加内存占用，过小会降低查询性能。
        b.代码示例
            ---
            import math
            
            def recommend_nlist(num_vectors):
                """推荐nlist参数"""
                sqrt_n = int(math.sqrt(num_vectors))
                
                recommendations = {
                    "保守": sqrt_n,
                    "推荐": 2 * sqrt_n,
                    "激进": 4 * sqrt_n
                }
                
                for key in recommendations:
                    recommendations[key] = min(max(recommendations[key], 128), 65536)
                
                return recommendations
            
            test_sizes = [10000, 100000, 1000000, 10000000]
            
            print("nlist参数推荐:")
            for size in test_sizes:
                recs = recommend_nlist(size)
                print(f"\n数据量: {size:,}")
                for level, value in recs.items():
                    print(f"  {level}: {value}")
            ---
    b.nprobe参数
        a.功能说明
            nprobe是查询时搜索的聚类中心数量。影响查询准确度和速度。nprobe越大，准确度越高但速度越慢。推荐值：nlist的1%-10%。常见取值：8、16、32、64。需要在准确度和性能间平衡。可以根据业务需求动态调整。建议通过实验确定最优值。
        b.代码示例
            ---
            def recommend_nprobe(nlist, accuracy_requirement="medium"):
                """推荐nprobe参数"""
                recommendations = {
                    "low": max(int(nlist * 0.01), 8),
                    "medium": max(int(nlist * 0.05), 16),
                    "high": max(int(nlist * 0.10), 32)
                }
                
                return recommendations.get(accuracy_requirement, 16)
            
            nlist_values = [128, 512, 1024, 2048]
            
            print("nprobe参数推荐:")
            for nlist in nlist_values:
                print(f"\nnlist={nlist}:")
                for level in ["low", "medium", "high"]:
                    nprobe = recommend_nprobe(nlist, level)
                    print(f"  {level}: {nprobe}")
            
            import time
            import numpy as np
            
            def benchmark_nprobe(collection, nprobe_values):
                """测试不同nprobe的性能"""
                query_vector = [[np.random.random() for _ in range(128)]]
                
                results = {}
                for nprobe in nprobe_values:
                    latencies = []
                    
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": nprobe}},
                            limit=10
                        )
                        latency = (time.time() - start) * 1000
                        latencies.append(latency)
                    
                    results[nprobe] = {
                        "avg": sum(latencies) / len(latencies),
                        "p99": sorted(latencies)[int(len(latencies) * 0.99)]
                    }
                
                return results
            ---

12.3 查询优化

01.查询策略
    a.批量查询
        a.功能说明
            批量查询可以显著提升吞吐量。一次查询多个向量，减少网络开销。Milvus支持批量查询，自动并行处理。适合离线批处理场景。可以提升10-100倍吞吐量。需要平衡批量大小和延迟。建议批量大小：10-1000。实现异步批量查询进一步提升性能。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            import numpy as np
            import time
            
            connections.connect(host="localhost", port="19530")
            collection = Collection("test_collection")
            collection.load()
            
            # 1. 单个查询基准测试
            def single_query_benchmark(collection, num_queries=100):
                """单个查询基准测试"""
                start = time.time()
                
                for _ in range(num_queries):
                    query_vector = [[np.random.random() for _ in range(128)]]
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                
                elapsed = time.time() - start
                qps = num_queries / elapsed
                
                print(f"单个查询:")
                print(f"  总耗时: {elapsed:.2f}s")
                print(f"  QPS: {qps:.2f}")
                
                return qps
            
            # 2. 批量查询基准测试
            def batch_query_benchmark(collection, num_queries=100, batch_size=10):
                """批量查询基准测试"""
                start = time.time()
                
                for i in range(0, num_queries, batch_size):
                    batch_vectors = [
                        [np.random.random() for _ in range(128)]
                        for _ in range(min(batch_size, num_queries - i))
                    ]
                    
                    collection.search(
                        data=batch_vectors,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                
                elapsed = time.time() - start
                qps = num_queries / elapsed
                
                print(f"\n批量查询(batch_size={batch_size}):")
                print(f"  总耗时: {elapsed:.2f}s")
                print(f"  QPS: {qps:.2f}")
                
                return qps
            
            # 3. 对比测试
            print("查询性能对比:\n")
            single_qps = single_query_benchmark(collection, 100)
            
            for batch_size in [10, 50, 100]:
                batch_qps = batch_query_benchmark(collection, 100, batch_size)
                speedup = batch_qps / single_qps
                print(f"  加速比: {speedup:.2f}x")
            ---
    b.过滤优化
        a.功能说明
            合理使用过滤条件提升查询效率。在向量检索前先过滤，减少搜索范围。使用标量索引加速过滤。避免复杂的过滤表达式。优先使用等值过滤和范围过滤。组合多个过滤条件时注意顺序。使用分区代替过滤提升性能。
        b.代码示例
            ---
            # 1. 基础过滤
            def search_with_filter(collection, query_vector, filter_expr):
                """带过滤的查询"""
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10,
                    expr=filter_expr
                )
                
                return results
            
            # 等值过滤
            results = search_with_filter(
                collection,
                [np.random.random() for _ in range(128)],
                'category == "electronics"'
            )
            
            # 范围过滤
            results = search_with_filter(
                collection,
                [np.random.random() for _ in range(128)],
                'price >= 100 and price <= 500'
            )
            
            # 2. 使用标量索引
            collection.create_index(
                field_name="category",
                index_params={"index_type": "STL_SORT"}
            )
            
            collection.create_index(
                field_name="price",
                index_params={"index_type": "STL_SORT"}
            )
            
            # 3. 分区代替过滤
            categories = ["electronics", "clothing", "books"]
            for cat in categories:
                if not collection.has_partition(f"cat_{cat}"):
                    collection.create_partition(f"cat_{cat}")
            
            results = collection.search(
                data=[[np.random.random() for _ in range(128)]],
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                partition_names=["cat_electronics"]
            )
            
            print("过滤优化建议:")
            print("  1. 使用标量索引加速过滤")
            print("  2. 优化过滤条件顺序")
            print("  3. 使用分区代替过滤")
            print("  4. 避免复杂的表达式")
            ---

02.缓存策略
    a.结果缓存
        a.功能说明
            缓存热门查询结果提升响应速度。适合查询重复率高的场景。使用Redis或内存缓存。设置合理的缓存过期时间。实现缓存预热和更新策略。监控缓存命中率。平衡缓存大小和命中率。实现多级缓存提升性能。
        b.代码示例
            ---
            import redis
            import json
            import hashlib
            
            class QueryCache:
                def __init__(self, redis_host="localhost", redis_port=6379, ttl=3600):
                    self.redis_client = redis.Redis(host=redis_host, port=redis_port, decode_responses=True)
                    self.ttl = ttl
                    self.stats = {"hits": 0, "misses": 0}
                
                def _generate_key(self, query_vector, params):
                    """生成缓存键"""
                    data = {
                        "vector": query_vector,
                        "params": params
                    }
                    data_str = json.dumps(data, sort_keys=True)
                    key = hashlib.md5(data_str.encode()).hexdigest()
                    return f"milvus:query:{key}"
                
                def get(self, query_vector, params):
                    """获取缓存结果"""
                    key = self._generate_key(query_vector, params)
                    cached = self.redis_client.get(key)
                    
                    if cached:
                        self.stats["hits"] += 1
                        return json.loads(cached)
                    else:
                        self.stats["misses"] += 1
                        return None
                
                def set(self, query_vector, params, results):
                    """设置缓存"""
                    key = self._generate_key(query_vector, params)
                    
                    results_data = [
                        {
                            "id": r.id,
                            "distance": r.distance,
                            "entity": r.entity
                        }
                        for r in results[0]
                    ]
                    
                    self.redis_client.setex(
                        key,
                        self.ttl,
                        json.dumps(results_data)
                    )
                
                def search_with_cache(self, collection, query_vector, params):
                    """带缓存的查询"""
                    cached_results = self.get(query_vector, params)
                    
                    if cached_results:
                        return cached_results
                    
                    results = collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=params,
                        limit=10
                    )
                    
                    self.set(query_vector, params, results)
                    
                    return results
                
                def get_stats(self):
                    """获取缓存统计"""
                    total = self.stats["hits"] + self.stats["misses"]
                    hit_rate = self.stats["hits"] / total if total > 0 else 0
                    
                    return {
                        "hits": self.stats["hits"],
                        "misses": self.stats["misses"],
                        "hit_rate": hit_rate
                    }
            
            cache = QueryCache(ttl=3600)
            ---
    b.向量缓存
        a.功能说明
            缓存常用向量数据减少加载时间。将热点向量保存在内存。使用LRU策略管理缓存。预加载常用数据到缓存。监控缓存使用情况。实现缓存预热机制。平衡缓存大小和性能。
        b.代码示例
            ---
            from collections import OrderedDict
            import numpy as np
            
            class VectorCache:
                def __init__(self, max_size=10000):
                    self.cache = OrderedDict()
                    self.max_size = max_size
                    self.stats = {"hits": 0, "misses": 0}
                
                def get(self, vector_id):
                    """获取向量"""
                    if vector_id in self.cache:
                        self.cache.move_to_end(vector_id)
                        self.stats["hits"] += 1
                        return self.cache[vector_id]
                    else:
                        self.stats["misses"] += 1
                        return None
                
                def put(self, vector_id, vector):
                    """存入向量"""
                    if vector_id in self.cache:
                        self.cache.move_to_end(vector_id)
                    else:
                        if len(self.cache) >= self.max_size:
                            self.cache.popitem(last=False)
                        
                        self.cache[vector_id] = vector
                
                def batch_put(self, vectors_dict):
                    """批量存入"""
                    for vid, vec in vectors_dict.items():
                        self.put(vid, vec)
                
                def preload(self, collection, vector_ids):
                    """预加载向量"""
                    results = collection.query(
                        expr=f"id in {vector_ids}",
                        output_fields=["id", "embedding"]
                    )
                    
                    for result in results:
                        self.put(result["id"], result["embedding"])
                    
                    print(f"预加载{len(results)}个向量到缓存")
                
                def get_stats(self):
                    """获取统计信息"""
                    total = self.stats["hits"] + self.stats["misses"]
                    hit_rate = self.stats["hits"] / total if total > 0 else 0
                    
                    return {
                        "size": len(self.cache),
                        "max_size": self.max_size,
                        "hits": self.stats["hits"],
                        "misses": self.stats["misses"],
                        "hit_rate": hit_rate
                    }
            
            vector_cache = VectorCache(max_size=10000)
            ---

12.4 生产部署

01.部署架构
    a.单机部署
        a.功能说明
            单机部署适合开发测试和小规模应用。所有组件运行在一台服务器。使用Docker Compose快速部署。资源需求：8核16GB内存起。支持数百万级向量。部署简单，维护成本低。不支持高可用和水平扩展。适合POC和小型项目。
        b.代码示例
            ---
            # Docker Compose单机部署配置
            
            print("单机部署步骤:")
            print("1. 创建docker-compose.yml文件")
            print("2. 配置etcd、minio、milvus服务")
            print("3. 执行: docker-compose up -d")
            print("4. 验证: docker-compose ps")
            print("5. 查看日志: docker-compose logs -f")
            
            # 资源需求
            resource_requirements = {
                "CPU": "8核以上",
                "内存": "16GB以上",
                "存储": "SSD 100GB以上",
                "网络": "千兆网卡",
                "适用规模": "< 500万向量"
            }
            
            print("\n资源需求:")
            for key, value in resource_requirements.items():
                print(f"  {key}: {value}")
            
            # 单机部署优缺点
            pros_cons = {
                "优点": [
                    "部署简单快速",
                    "维护成本低",
                    "适合开发测试",
                    "无需复杂配置"
                ],
                "缺点": [
                    "不支持高可用",
                    "无法水平扩展",
                    "性能受限于单机",
                    "存在单点故障"
                ]
            }
            
            print("\n优缺点分析:")
            for category, items in pros_cons.items():
                print(f"{category}:")
                for item in items:
                    print(f"  - {item}")
            ---
    b.集群部署
        a.功能说明
            集群部署适合生产环境和大规模应用。组件分布式部署，支持水平扩展。使用Kubernetes编排管理。支持高可用和故障转移。可扩展到数十亿级向量。需要专业运维团队。适合企业级应用。
        b.代码示例
            ---
            # Kubernetes集群部署
            
            print("Kubernetes集群部署步骤:")
            print("1. 添加Milvus Helm仓库")
            print("   helm repo add milvus https://milvus-io.github.io/milvus-helm/")
            print("2. 创建命名空间")
            print("   kubectl create namespace milvus")
            print("3. 准备values.yaml配置文件")
            print("4. 安装Milvus")
            print("   helm install milvus milvus/milvus -n milvus -f values.yaml")
            print("5. 验证部署")
            print("   kubectl get pods -n milvus")
            
            # 集群组件说明
            cluster_components = {
                "Proxy": "接收客户端请求，路由到相应节点",
                "Query Node": "执行向量检索，可水平扩展",
                "Data Node": "处理数据写入和持久化",
                "Index Node": "构建和管理索引",
                "Root Coord": "集群协调和元数据管理",
                "Query Coord": "查询任务调度和负载均衡",
                "Data Coord": "数据分片和副本管理",
                "Index Coord": "索引构建任务调度"
            }
            
            print("\n集群组件:")
            for component, desc in cluster_components.items():
                print(f"  {component}: {desc}")
            
            # 集群配置建议
            cluster_config = {
                "Query Node": {
                    "副本数": "2-4",
                    "CPU": "4核/节点",
                    "内存": "8GB/节点"
                },
                "Data Node": {
                    "副本数": "2-3",
                    "CPU": "2核/节点",
                    "内存": "4GB/节点"
                },
                "Index Node": {
                    "副本数": "1-2",
                    "CPU": "4核/节点",
                    "内存": "8GB/节点"
                },
                "Proxy": {
                    "副本数": "2-3",
                    "CPU": "2核/节点",
                    "内存": "4GB/节点"
                }
            }
            
            print("\n集群配置建议:")
            for component, config in cluster_config.items():
                print(f"{component}:")
                for key, value in config.items():
                    print(f"  {key}: {value}")
            ---

02.运维管理
    a.监控告警
        a.功能说明
            建立完善的监控告警体系。监控服务健康状态和性能指标。使用Prometheus+Grafana可视化。配置告警规则和通知渠道。监控资源使用情况。跟踪查询性能和错误率。实现自动化运维。定期检查和优化。
        b.代码示例
            ---
            # 监控指标说明
            
            monitoring_metrics = {
                "性能指标": {
                    "QPS": "每秒查询数",
                    "查询延迟": "P50/P99延迟",
                    "吞吐量": "数据写入速率",
                    "索引构建速度": "向量/秒"
                },
                "资源指标": {
                    "CPU使用率": "各组件CPU占用",
                    "内存使用率": "各组件内存占用",
                    "磁盘使用率": "存储空间占用",
                    "网络流量": "入站/出站流量"
                },
                "业务指标": {
                    "向量数量": "Collection中的向量总数",
                    "查询成功率": "成功查询/总查询",
                    "错误率": "错误查询/总查询",
                    "缓存命中率": "缓存命中/总查询"
                }
            }
            
            print("监控指标体系:")
            for category, metrics in monitoring_metrics.items():
                print(f"\n{category}:")
                for metric, desc in metrics.items():
                    print(f"  {metric}: {desc}")
            
            # 告警规则
            alert_rules = [
                {
                    "名称": "查询延迟过高",
                    "条件": "P99延迟 > 100ms",
                    "级别": "Warning",
                    "持续时间": "5分钟"
                },
                {
                    "名称": "错误率过高",
                    "条件": "错误率 > 5%",
                    "级别": "Critical",
                    "持续时间": "5分钟"
                },
                {
                    "名称": "内存使用率过高",
                    "条件": "内存使用率 > 90%",
                    "级别": "Warning",
                    "持续时间": "5分钟"
                },
                {
                    "名称": "服务不可用",
                    "条件": "服务健康检查失败",
                    "级别": "Critical",
                    "持续时间": "1分钟"
                }
            ]
            
            print("\n告警规则:")
            for rule in alert_rules:
                print(f"\n{rule['名称']}:")
                print(f"  条件: {rule['条件']}")
                print(f"  级别: {rule['级别']}")
                print(f"  持续时间: {rule['持续时间']}")
            
            # Grafana仪表板
            dashboard_panels = [
                "QPS趋势图",
                "查询延迟分布",
                "CPU使用率",
                "内存使用率",
                "磁盘IO",
                "网络流量",
                "错误率",
                "向量数量"
            ]
            
            print("\nGrafana仪表板面板:")
            for i, panel in enumerate(dashboard_panels, 1):
                print(f"  {i}. {panel}")
            ---
    b.容量规划
        a.功能说明
            合理规划资源容量确保系统稳定。评估数据规模和增长趋势。计算存储、内存、CPU需求。预留30%-50%冗余空间。考虑峰值负载和突发流量。制定扩容策略和时间表。监控资源使用趋势。定期评估和调整。
        b.代码示例
            ---
            # 容量规划计算器
            
            class CapacityPlanner:
                def __init__(self):
                    self.index_overhead = 1.2
                    self.redundancy = 1.5
                
                def calculate_storage(self, num_vectors, vector_dim, dtype="float32"):
                    """计算存储需求"""
                    bytes_per_element = {
                        "float32": 4,
                        "float16": 2,
                        "int8": 1
                    }
                    
                    vector_size = num_vectors * vector_dim * bytes_per_element[dtype]
                    total_size = vector_size * self.index_overhead
                    required_size = total_size * self.redundancy
                    
                    return {
                        "vector_size_gb": vector_size / (1024**3),
                        "with_index_gb": total_size / (1024**3),
                        "required_gb": required_size / (1024**3)
                    }
                
                def calculate_memory(self, num_vectors, vector_dim, index_type="IVF_FLAT"):
                    """计算内存需求"""
                    vector_memory = num_vectors * vector_dim * 4
                    
                    index_overhead = {
                        "FLAT": 1.0,
                        "IVF_FLAT": 1.1,
                        "IVF_SQ8": 0.35,
                        "IVF_PQ": 0.15,
                        "HNSW": 1.5
                    }
                    
                    total_memory = vector_memory * index_overhead.get(index_type, 1.0)
                    required_memory = total_memory * 1.5
                    
                    return {
                        "vector_memory_gb": vector_memory / (1024**3),
                        "total_memory_gb": total_memory / (1024**3),
                        "required_gb": required_memory / (1024**3)
                    }
                
                def calculate_qps_capacity(self, num_query_nodes, cpu_per_node, latency_target_ms=50):
                    """计算QPS容量"""
                    qps_per_core = 1000 / latency_target_ms
                    total_qps = num_query_nodes * cpu_per_node * qps_per_core
                    safe_qps = total_qps * 0.7
                    
                    return {
                        "theoretical_qps": total_qps,
                        "safe_qps": safe_qps
                    }
                
                def generate_plan(self, num_vectors, vector_dim, qps_requirement, index_type="IVF_FLAT"):
                    """生成容量规划方案"""
                    storage = self.calculate_storage(num_vectors, vector_dim)
                    memory = self.calculate_memory(num_vectors, vector_dim, index_type)
                    
                    qps_per_node = 1000
                    num_query_nodes = max(2, int(qps_requirement / qps_per_node) + 1)
                    
                    qps_capacity = self.calculate_qps_capacity(num_query_nodes, cpu_per_node=4)
                    
                    plan = {
                        "数据规模": {
                            "向量数量": f"{num_vectors:,}",
                            "向量维度": vector_dim,
                            "索引类型": index_type
                        },
                        "存储需求": {
                            "原始数据": f"{storage['vector_size_gb']:.2f} GB",
                            "含索引": f"{storage['with_index_gb']:.2f} GB",
                            "推荐容量": f"{storage['required_gb']:.2f} GB"
                        },
                        "内存需求": {
                            "向量数据": f"{memory['vector_memory_gb']:.2f} GB",
                            "含索引": f"{memory['total_memory_gb']:.2f} GB",
                            "推荐容量": f"{memory['required_gb']:.2f} GB"
                        },
                        "计算资源": {
                            "Query Node数量": num_query_nodes,
                            "每节点CPU": "4核",
                            "每节点内存": f"{memory['required_gb'] / num_query_nodes:.0f} GB"
                        },
                        "QPS容量": {
                            "理论QPS": f"{qps_capacity['theoretical_qps']:.0f}",
                            "安全QPS": f"{qps_capacity['safe_qps']:.0f}",
                            "需求QPS": qps_requirement
                        }
                    }
                    
                    return plan
            
            # 使用容量规划器
            planner = CapacityPlanner()
            
            # 场景1: 1000万向量，768维，1000 QPS
            plan1 = planner.generate_plan(
                num_vectors=10000000,
                vector_dim=768,
                qps_requirement=1000,
                index_type="IVF_FLAT"
            )
            
            print("容量规划方案:")
            import json
            print(json.dumps(plan1, indent=2, ensure_ascii=False))
            
            # 场景2: 1亿向量，512维，5000 QPS
            plan2 = planner.generate_plan(
                num_vectors=100000000,
                vector_dim=512,
                qps_requirement=5000,
                index_type="HNSW"
            )
            
            print("\n大规模场景:")
            print(json.dumps(plan2, indent=2, ensure_ascii=False))
            
            # 容量规划建议
            planning_tips = [
                "预留30%-50%冗余空间",
                "考虑数据增长趋势",
                "评估峰值负载需求",
                "制定扩容策略",
                "定期审查和调整",
                "监控资源使用趋势",
                "建立容量告警机制"
            ]
            
            print("\n容量规划建议:")
            for i, tip in enumerate(planning_tips, 1):
                print(f"  {i}. {tip}")
            ---

Directory02

Explorer

11.milvus

Table of Contents

1 基础概念

1.1 向量数据库

1.2 Milvus架构

1.3 核心特性

2 快速开始

2.1 安装部署

2.2 连接数据库

2.3 基础操作

3 Collection管理

3.1 Schema定义

3.2 创建Collection

3.3 加载和释放

3.4 删除Collection

4 数据操作

4.1 插入数据

4.2 删除数据

4.3 更新数据

4.4 批量操作

5 索引系统

5.1 向量索引类型

5.2 FLAT索引

5.3 IVF系列索引

5.4 HNSW索引

5.5 标量索引

5.6 索引参数

6 搜索查询

6.1 相似度搜索

6.2 范围查询

6.3 混合检索

6.4 标量过滤

6.5 批量查询

7 高级特性

7.1 分区管理

7.2 副本配置

7.3 动态Schema

7.4 时间旅行

7.5 混合搜索Hybrid

8 性能优化

8.1 索引选择策略

8.2 查询参数调优

8.3 内存优化

8.4 并发控制

8.5 缓存策略

9 集群部署

9.1 分布式架构

9.2 Docker Compose

9.3 Kubernetes部署

9.4 高可用配置

9.5 扩容缩容

10 AI框架集成

10.1 LangChain集成

10.2 LlamaIndex集成

10.3 Haystack集成

11 运维监控

11.1 监控指标

11.2 日志管理

11.3 备份恢复

11.4 故障处理

12 最佳实践

12.1 数据建模

12.2 索引选择

12.3 查询优化

12.4 生产部署

Table of Contents