1 基础概念

1.1 向量数据库

01.向量数据库定义
    a.基本概念
        a.功能说明
            向量数据库是专门用于存储、索引和查询高维向量数据的数据库系统。它通过向量相似度计算实现语义检索,广泛应用于推荐系统、图像搜索、自然语言处理等AI场景。向量数据库能够高效处理百万到十亿级别的向量数据,支持毫秒级的相似度查询。
        b.代码示例
            ---
            # 向量数据库核心概念
            # 向量:[0.1, 0.2, 0.3, ..., 0.n] 高维数组
            # 相似度:通过距离度量(欧氏距离、余弦相似度等)计算向量间的相似程度
            # 索引:加速向量检索的数据结构(如HNSW、IVF等)
            
            import numpy as np
            
            # 示例:两个向量的余弦相似度计算
            vector1 = np.array([0.1, 0.2, 0.3])
            vector2 = np.array([0.2, 0.3, 0.4])
            
            similarity = np.dot(vector1, vector2) / (np.linalg.norm(vector1) * np.linalg.norm(vector2))
            print(f"余弦相似度: {similarity}")
            ---
    b.应用场景
        a.功能说明
            向量数据库在多个AI领域有广泛应用。在推荐系统中,通过用户和物品的向量表示实现个性化推荐。在图像搜索中,将图像编码为向量进行以图搜图。在自然语言处理中,支持语义搜索、问答系统和RAG应用。在异常检测中,通过向量距离识别异常模式。
        b.代码示例
            ---
            # 典型应用场景示例
            
            # 1. 语义搜索:将文本转换为向量进行相似度检索
            query_text = "什么是人工智能"
            query_vector = embedding_model.encode(query_text)
            results = vector_db.search(query_vector, top_k=5)
            
            # 2. 推荐系统:基于用户向量找相似用户
            user_vector = get_user_embedding(user_id)
            similar_users = vector_db.search(user_vector, top_k=10)
            
            # 3. 图像搜索:以图搜图
            image_vector = image_encoder.encode(image)
            similar_images = vector_db.search(image_vector, top_k=20)
            ---

02.向量数据库vs传统数据库
    a.数据类型差异
        a.功能说明
            传统数据库主要存储结构化数据(数字、字符串、日期等),查询基于精确匹配或范围比较。向量数据库存储高维向量(通常128-1536维),查询基于相似度计算。传统数据库使用B树、哈希索引,向量数据库使用ANN索引(如HNSW、IVF)。两者的查询语义完全不同:传统数据库是精确查询,向量数据库是近似查询。
        b.代码示例
            ---
            # 传统数据库查询(精确匹配)
            SELECT * FROM products WHERE category = 'electronics' AND price < 1000;
            
            # 向量数据库查询(相似度检索)
            from pymilvus import Collection
            
            collection = Collection("products")
            search_vector = [[0.1, 0.2, 0.3, ...]]  # 查询向量
            
            results = collection.search(
                data=search_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 10}},
                limit=10
            )
            ---
    b.性能特点
        a.功能说明
            传统数据库在精确查询和事务处理上表现优异,支持ACID特性。向量数据库在高维相似度搜索上具有优势,通过近似最近邻算法实现亚线性时间复杂度。传统数据库扩展性受限于关系模型,向量数据库天然支持水平扩展。在查询延迟上,向量数据库对百万级数据可实现毫秒级响应。
        b.代码示例
            ---
            # 性能对比示例
            
            # 传统数据库:精确查询,O(log n)复杂度
            import time
            start = time.time()
            cursor.execute("SELECT * FROM users WHERE id = 12345")
            print(f"传统数据库查询耗时: {time.time() - start}s")
            
            # 向量数据库:近似查询,O(log n)复杂度(通过索引)
            start = time.time()
            results = collection.search(
                data=[query_vector],
                anns_field="vector",
                param={"metric_type": "IP", "params": {"nprobe": 16}},
                limit=10
            )
            print(f"向量数据库查询耗时: {time.time() - start}s")
            
            # 向量数据库在百万级数据上通常能保持<10ms的查询延迟
            ---

1.2 Milvus架构

01.系统架构
    a.云原生设计
        a.功能说明
            Milvus采用云原生架构,将存储和计算分离,支持弹性扩展。系统分为四个层次:接入层(负载均衡和请求路由)、协调层(元数据管理和任务调度)、执行层(数据处理和查询执行)、存储层(对象存储和消息队列)。这种架构使得各组件可以独立扩展,提高系统的可用性和可维护性。
        b.代码示例
            ---
            # Milvus架构组件
            
            # 1. 接入层(Access Layer)
            # - Proxy:接收客户端请求,进行负载均衡
            # - 提供gRPC和RESTful API
            
            # 2. 协调层(Coordinator Service)
            # - Root Coordinator:管理DDL操作(创建/删除collection)
            # - Data Coordinator:管理数据段和binlog
            # - Query Coordinator:管理查询节点和负载均衡
            # - Index Coordinator:管理索引构建任务
            
            # 3. 执行层(Worker Nodes)
            # - Query Node:执行向量搜索
            # - Data Node:数据持久化
            # - Index Node:构建向量索引
            
            # 4. 存储层(Storage)
            # - 对象存储(MinIO/S3):存储向量数据和索引
            # - 元数据存储(etcd):存储集合schema和元信息
            # - 消息队列(Pulsar/Kafka):数据流和日志复制
            ---
    b.分布式特性
        a.功能说明
            Milvus支持分布式部署,通过数据分片和副本机制实现高可用。数据按segment切分,每个segment包含固定数量的向量。查询时,多个Query Node并行处理不同的segment,最后合并结果。系统支持动态扩缩容,新增节点可自动接管部分负载。通过副本机制保证数据可靠性,支持跨可用区部署。
        b.代码示例
            ---
            from pymilvus import connections, Collection, utility
            
            # 连接Milvus集群
            connections.connect(
                alias="default",
                host="milvus-cluster.example.com",
                port="19530"
            )
            
            # 查看集群状态
            print(f"Milvus版本: {utility.get_server_version()}")
            
            # 创建collection时指定分片数量
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            
            collection = Collection(
                name="distributed_collection",
                schema=schema,
                shards_num=4  # 指定4个分片,提高并行度
            )
            
            # 设置副本数量
            collection.set_properties(properties={"collection.replica.number": 2})
            ---

02.核心组件
    a.Proxy代理层
        a.功能说明
            Proxy是Milvus的接入层,负责接收客户端请求并路由到后端服务。它提供统一的API接口,支持gRPC和RESTful协议。Proxy执行请求验证、参数检查和结果聚合。在集群模式下,多个Proxy实例通过负载均衡器分发请求,保证高可用性。Proxy是无状态服务,可以水平扩展。
        b.代码示例
            ---
            # Proxy配置示例(milvus.yaml)
            
            proxy:
              port: 19530
              grpc:
                serverMaxRecvSize: 536870912  # 512MB
                serverMaxSendSize: 536870912
                clientMaxRecvSize: 104857600  # 100MB
                clientMaxSendSize: 104857600
              http:
                enabled: true
                port: 9091
              timeTickInterval: 200  # ms
              msgStream:
                timeTick:
                  bufSize: 512
              maxTaskNum: 1024  # 最大并发任务数
            
            # 客户端通过Proxy连接
            from pymilvus import connections
            
            connections.connect(
                alias="default",
                host="proxy.milvus.svc.cluster.local",
                port="19530",
                user="username",
                password="password"
            )
            ---
    b.Coordinator协调器
        a.功能说明
            Coordinator负责元数据管理和任务调度。Root Coordinator管理collection和partition的创建删除,维护全局时间戳。Data Coordinator管理数据段的分配和合并,协调数据持久化。Query Coordinator管理查询节点的负载均衡,分配segment到不同节点。Index Coordinator调度索引构建任务,监控索引状态。各Coordinator通过etcd实现高可用。
        b.代码示例
            ---
            # Coordinator工作流程示例
            
            # 1. Root Coordinator:创建collection
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            
            schema = CollectionSchema([
                FieldSchema("id", DataType.INT64, is_primary=True),
                FieldSchema("vector", DataType.FLOAT_VECTOR, dim=128)
            ])
            
            # Root Coordinator处理DDL请求
            collection = Collection("example", schema=schema)
            
            # 2. Data Coordinator:插入数据
            data = [
                [i for i in range(1000)],
                [[np.random.random() for _ in range(128)] for _ in range(1000)]
            ]
            collection.insert(data)  # Data Coordinator分配segment
            
            # 3. Index Coordinator:构建索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            collection.create_index("vector", index_params)  # Index Coordinator调度构建任务
            
            # 4. Query Coordinator:执行查询
            collection.load()  # Query Coordinator分配segment到Query Node
            results = collection.search([[0.1]*128], "vector", {"nprobe": 10}, limit=10)
            ---
    c.Worker节点
        a.功能说明
            Worker节点执行实际的数据处理任务。Query Node加载索引并执行向量搜索,支持多个segment并行查询。Data Node负责数据持久化,将binlog写入对象存储。Index Node构建向量索引,支持多种索引类型。Worker节点是有状态服务,通过Coordinator进行任务分配和负载均衡。节点故障时,Coordinator会将任务重新分配到其他节点。
        b.代码示例
            ---
            # Worker节点配置示例
            
            # Query Node配置
            queryNode:
              cacheSize: 32  # GB,缓存大小
              gracefulStopTimeout: 30  # 优雅停机超时
              stats:
                publishInterval: 1000  # 统计信息发布间隔(ms)
              dataSync:
                flowGraph:
                  maxQueueLength: 1024
                  maxParallelism: 1024
              segcore:
                chunkRows: 1024  # segment chunk大小
            
            # Data Node配置
            dataNode:
              dataSync:
                flowGraph:
                  maxQueueLength: 1024
              flush:
                insertBufSize: 16777216  # 16MB
            
            # Index Node配置
            indexNode:
              scheduler:
                buildParallel: 1  # 并行构建索引数量
            
            # 监控Worker节点状态
            from pymilvus import utility
            
            # 查看Query Node信息
            query_nodes = utility.get_query_segment_info("collection_name")
            for node in query_nodes:
                print(f"Node ID: {node.nodeID}, Segment: {node.segmentID}, State: {node.state}")
            ---

1.3 核心特性

01.高性能搜索
    a.毫秒级响应
        a.功能说明
            Milvus通过优化的索引算法和内存管理实现毫秒级查询响应。在百万级向量数据上,使用HNSW索引可实现1-5ms的查询延迟。系统支持GPU加速,进一步提升搜索性能。通过预加载索引到内存,避免磁盘IO开销。支持批量查询,提高吞吐量。
        b.代码示例
            ---
            import time
            from pymilvus import Collection
            
            collection = Collection("benchmark")
            collection.load()  # 预加载索引到内存
            
            # 单次查询性能测试
            query_vector = [[0.1] * 128]
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"ef": 64}},
                limit=10
            )
            latency = (time.time() - start) * 1000
            print(f"查询延迟: {latency:.2f}ms")
            
            # 批量查询提高吞吐量
            batch_vectors = [[0.1] * 128 for _ in range(100)]
            
            start = time.time()
            results = collection.search(
                data=batch_vectors,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"ef": 64}},
                limit=10
            )
            total_time = time.time() - start
            qps = len(batch_vectors) / total_time
            print(f"批量查询QPS: {qps:.2f}")
            ---
    b.海量数据支持
        a.功能说明
            Milvus支持十亿级向量数据存储和检索。通过分布式架构,数据分散存储在多个节点上。采用segment机制,将数据切分为固定大小的块,便于管理和查询。支持增量索引构建,新数据可快速加入索引。通过数据压缩和量化技术,降低存储成本。支持冷热数据分离,热数据保存在内存,冷数据存储在对象存储。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            # 查看collection统计信息
            collection = Collection("large_scale")
            stats = collection.num_entities
            print(f"向量总数: {stats:,}")
            
            # 大规模数据插入
            batch_size = 10000
            total_vectors = 10000000  # 1000万向量
            
            for i in range(0, total_vectors, batch_size):
                data = [
                    list(range(i, i + batch_size)),
                    [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                ]
                collection.insert(data)
                
                if (i + batch_size) % 100000 == 0:
                    collection.flush()  # 定期刷新到磁盘
                    print(f"已插入 {i + batch_size:,} 条数据")
            
            # 创建索引支持大规模检索
            index_params = {
                "index_type": "IVF_PQ",  # 使用PQ量化降低内存占用
                "metric_type": "L2",
                "params": {
                    "nlist": 2048,  # 增加聚类中心数量
                    "m": 8,  # PQ子向量数量
                    "nbits": 8
                }
            }
            collection.create_index("embedding", index_params)
            ---

02.灵活扩展
    a.水平扩展
        a.功能说明
            Milvus支持无缝的水平扩展,可以动态增加Query Node、Data Node和Index Node。新增节点会自动加入集群,Coordinator会重新分配负载。通过增加Query Node提升查询吞吐量,增加Data Node提高写入性能,增加Index Node加速索引构建。扩展过程不影响在线服务,支持滚动升级。
        b.代码示例
            ---
            # Kubernetes环境下的水平扩展
            
            # 1. 扩展Query Node(提升查询性能)
            # kubectl scale deployment milvus-querynode --replicas=5
            
            # 2. 扩展Data Node(提升写入性能)
            # kubectl scale deployment milvus-datanode --replicas=3
            
            # 3. 扩展Index Node(加速索引构建)
            # kubectl scale deployment milvus-indexnode --replicas=2
            
            # 在应用层监控扩展效果
            from pymilvus import connections, utility
            
            connections.connect("default", host="milvus-proxy", port="19530")
            
            # 查看集群节点信息
            import requests
            response = requests.get("http://milvus-proxy:9091/api/v1/health")
            print(f"集群状态: {response.json()}")
            
            # 测试扩展后的性能
            collection = Collection("test")
            collection.load(replica_number=2)  # 使用2个副本提高查询并发
            
            # 并发查询测试
            import concurrent.futures
            
            def search_task(query_id):
                results = collection.search(
                    data=[[0.1] * 128],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10
                )
                return query_id
            
            with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
                futures = [executor.submit(search_task, i) for i in range(1000)]
                results = [f.result() for f in futures]
            print(f"并发查询完成: {len(results)}个请求")
            ---
    b.存储计算分离
        a.功能说明
            Milvus采用存储计算分离架构,向量数据和索引存储在对象存储(MinIO或S3)中,计算节点无状态。这种设计使得存储和计算可以独立扩展,降低成本。计算节点可以按需启动和销毁,支持弹性伸缩。存储层支持多副本和跨区域复制,保证数据可靠性。元数据存储在etcd中,支持高可用。
        b.代码示例
            ---
            # 存储计算分离配置示例(milvus.yaml)
            
            # 对象存储配置(MinIO)
            minio:
              address: minio.example.com
              port: 9000
              accessKeyID: minioadmin
              secretAccessKey: minioadmin
              useSSL: false
              bucketName: milvus-bucket
              rootPath: file  # 数据根路径
              useIAM: false
              iamEndpoint: ""
            
            # 或使用AWS S3
            # minio:
            #   address: s3.amazonaws.com
            #   port: 443
            #   accessKeyID: YOUR_ACCESS_KEY
            #   secretAccessKey: YOUR_SECRET_KEY
            #   useSSL: true
            #   bucketName: milvus-data
            #   rootPath: milvus
            #   useIAM: true
            #   iamEndpoint: ""
            #   region: us-west-2
            
            # 元数据存储配置(etcd)
            etcd:
              endpoints:
                - etcd-0.etcd:2379
                - etcd-1.etcd:2379
                - etcd-2.etcd:2379
              rootPath: by-dev  # 元数据根路径
              metaSubPath: meta
              kvSubPath: kv
            
            # 消息队列配置(Pulsar)
            pulsar:
              address: pulsar://pulsar-proxy:6650
              maxMessageSize: 5242880  # 5MB
            
            # 这种架构的优势
            # 1. 计算节点无状态,可快速扩缩容
            # 2. 存储层独立扩展,支持PB级数据
            # 3. 数据持久化在对象存储,成本低
            # 4. 支持多个集群共享存储
            ---

03.多语言支持
    a.SDK生态
        a.功能说明
            Milvus提供多语言SDK,包括Python、Java、Go、Node.js、C++等。所有SDK基于统一的gRPC接口,功能一致。Python SDK最为成熟,提供完整的API和丰富的示例。Java SDK适合企业级应用,性能优异。Go SDK轻量高效,适合微服务架构。Node.js SDK支持前端和后端开发。各SDK支持连接池、重试机制和负载均衡。
        b.代码示例
            ---
            # Python SDK
            from pymilvus import connections, Collection
            
            connections.connect("default", host="localhost", port="19530")
            collection = Collection("example")
            results = collection.search([[0.1]*128], "vector", {"nprobe": 10}, limit=10)
            
            # Java SDK
            // import io.milvus.client.*;
            // 
            // MilvusServiceClient client = new MilvusServiceClient(
            //     ConnectParam.newBuilder()
            //         .withHost("localhost")
            //         .withPort(19530)
            //         .build()
            // );
            // 
            // SearchParam searchParam = SearchParam.newBuilder()
            //     .withCollectionName("example")
            //     .withVectorFieldName("vector")
            //     .withVectors(Arrays.asList(Arrays.asList(0.1f, 0.2f, ...)))
            //     .withTopK(10)
            //     .build();
            // R<SearchResults> response = client.search(searchParam);
            
            # Go SDK
            // import "github.com/milvus-io/milvus-sdk-go/v2/client"
            // 
            // c, _ := client.NewGrpcClient(context.Background(), "localhost:19530")
            // searchResult, _ := c.Search(
            //     context.Background(),
            //     "example",
            //     []string{},
            //     "",
            //     []string{"id"},
            //     []entity.Vector{entity.FloatVector{0.1, 0.2, ...}},
            //     "vector",
            //     entity.L2,
            //     10,
            //     sp,
            // )
            
            # Node.js SDK
            // const { MilvusClient } = require("@zilliz/milvus2-sdk-node");
            // 
            // const client = new MilvusClient("localhost:19530");
            // const results = await client.search({
            //     collection_name: "example",
            //     vectors: [[0.1, 0.2, ...]],
            //     search_params: { nprobe: 10 },
            //     limit: 10
            // });
            ---
    b.RESTful API
        a.功能说明
            Milvus提供RESTful API,方便跨语言调用和快速集成。API基于HTTP协议,支持JSON格式的请求和响应。覆盖所有核心功能,包括collection管理、数据操作、搜索查询等。适合轻量级客户端和Web应用。支持API认证和访问控制。提供Swagger文档,便于测试和调试。
        b.代码示例
            ---
            import requests
            import json
            
            base_url = "http://localhost:9091/api/v1"
            
            # 1. 创建collection
            create_payload = {
                "collection_name": "rest_example",
                "schema": {
                    "fields": [
                        {"name": "id", "dtype": "Int64", "is_primary": True},
                        {"name": "vector", "dtype": "FloatVector", "params": {"dim": 128}}
                    ]
                }
            }
            response = requests.post(f"{base_url}/collection", json=create_payload)
            print(f"创建collection: {response.json()}")
            
            # 2. 插入数据
            insert_payload = {
                "collection_name": "rest_example",
                "fields_data": [
                    {"field_name": "id", "type": "Int64", "field": [1, 2, 3]},
                    {"field_name": "vector", "type": "FloatVector", "field": [[0.1]*128, [0.2]*128, [0.3]*128]}
                ]
            }
            response = requests.post(f"{base_url}/entities", json=insert_payload)
            print(f"插入数据: {response.json()}")
            
            # 3. 搜索
            search_payload = {
                "collection_name": "rest_example",
                "vectors": [[0.15] * 128],
                "dsl_type": "Dsl",
                "params": {"nprobe": 10},
                "limit": 5
            }
            response = requests.post(f"{base_url}/search", json=search_payload)
            print(f"搜索结果: {response.json()}")
            
            # 4. 查询collection信息
            response = requests.get(f"{base_url}/collection/info?collection_name=rest_example")
            print(f"Collection信息: {response.json()}")
            ---

2 快速开始

2.1 安装部署

01.Docker部署
    a.单机版安装
        a.功能说明
            使用Docker Compose可以快速部署Milvus单机版,适合开发和测试环境。单机版将所有组件运行在一个容器中,资源占用小,部署简单。支持数据持久化,重启后数据不丢失。默认端口19530用于gRPC连接,9091用于HTTP API。单机版性能受限于单台服务器资源,不支持高可用。
        b.代码示例
            ---
            # 1. 下载docker-compose.yml
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
            
            # 2. 启动Milvus
            docker-compose up -d
            
            # 3. 检查容器状态
            docker-compose ps
            
            # 输出示例:
            # NAME                COMMAND                  SERVICE             STATUS              PORTS
            # milvus-standalone   "/tini -- milvus run…"   standalone          running             0.0.0.0:9091->9091/tcp, 0.0.0.0:19530->19530/tcp
            # milvus-minio        "/usr/bin/docker-ent…"   minio               running             9000/tcp
            # milvus-etcd         "etcd -advertise-cli…"   etcd                running             2379-2380/tcp
            
            # 4. 查看日志
            docker-compose logs -f standalone
            
            # 5. 停止服务
            docker-compose down
            
            # 6. 数据持久化配置(docker-compose.yml)
            # volumes:
            #   - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
            ---
    b.集群版安装
        a.功能说明
            集群版通过Docker Compose部署多个组件,包括Proxy、Coordinator、Worker节点等。支持水平扩展和高可用,适合生产环境。各组件独立运行,可以单独扩展和升级。需要配置外部存储(MinIO/S3)和消息队列(Pulsar/Kafka)。集群版资源需求较高,建议至少3台服务器。
        b.代码示例
            ---
            # 1. 下载集群版配置
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-cluster-docker-compose.yml -O docker-compose.yml
            
            # 2. 修改配置文件(可选)
            # 编辑docker-compose.yml,调整资源限制和副本数量
            
            # 3. 启动集群
            docker-compose up -d
            
            # 4. 检查所有组件状态
            docker-compose ps
            
            # 输出示例:
            # NAME                    SERVICE             STATUS
            # milvus-rootcoord        rootcoord           running
            # milvus-datacoord        datacoord           running
            # milvus-querycoord       querycoord          running
            # milvus-indexcoord       indexcoord          running
            # milvus-proxy            proxy               running
            # milvus-querynode        querynode           running
            # milvus-datanode         datanode            running
            # milvus-indexnode        indexnode           running
            # milvus-minio            minio               running
            # milvus-etcd             etcd                running
            # milvus-pulsar           pulsar              running
            
            # 5. 扩展Query Node(提升查询性能)
            docker-compose up -d --scale querynode=3
            
            # 6. 健康检查
            curl http://localhost:9091/healthz
            ---

02.Kubernetes部署
    a.Helm安装
        a.功能说明
            使用Helm Chart可以在Kubernetes集群中快速部署Milvus。Helm提供参数化配置,支持自定义资源限制、副本数量、存储类型等。支持滚动更新和回滚,保证服务稳定性。可以集成Kubernetes生态工具,如Prometheus监控、Grafana可视化等。适合大规模生产环境,支持自动扩缩容。
        b.代码示例
            ---
            # 1. 添加Milvus Helm仓库
            helm repo add milvus https://milvus-io.github.io/milvus-helm/
            helm repo update
            
            # 2. 创建命名空间
            kubectl create namespace milvus
            
            # 3. 安装Milvus(使用默认配置)
            helm install milvus milvus/milvus --namespace milvus
            
            # 4. 自定义安装(创建values.yaml)
            cat > values.yaml <<EOF
            cluster:
              enabled: true
            
            image:
              all:
                repository: milvusdb/milvus
                tag: v2.3.0
            
            proxy:
              replicas: 2
            
            queryNode:
              replicas: 3
              resources:
                limits:
                  cpu: 4
                  memory: 8Gi
            
            dataNode:
              replicas: 2
            
            indexNode:
              replicas: 1
            
            minio:
              enabled: true
              mode: standalone
            
            pulsar:
              enabled: true
            
            etcd:
              replicaCount: 3
            EOF
            
            # 5. 使用自定义配置安装
            helm install milvus milvus/milvus -f values.yaml --namespace milvus
            
            # 6. 查看部署状态
            kubectl get pods -n milvus
            
            # 7. 暴露服务(使用LoadBalancer)
            kubectl expose deployment milvus-proxy --type=LoadBalancer --name=milvus-service --port=19530 -n milvus
            
            # 8. 获取外部IP
            kubectl get svc milvus-service -n milvus
            
            # 9. 升级Milvus
            helm upgrade milvus milvus/milvus -f values.yaml --namespace milvus
            
            # 10. 卸载
            helm uninstall milvus --namespace milvus
            ---
    b.Operator部署
        a.功能说明
            Milvus Operator是Kubernetes原生的部署方式,通过CRD定义Milvus集群。Operator自动管理集群生命周期,包括部署、升级、扩缩容、故障恢复等。支持声明式配置,只需定义期望状态,Operator自动调谐。提供更细粒度的控制,可以单独配置每个组件。适合需要深度定制和自动化运维的场景。
        b.代码示例
            ---
            # 1. 安装Milvus Operator
            kubectl apply -f https://raw.githubusercontent.com/milvus-io/milvus-operator/main/deploy/manifests/deployment.yaml
            
            # 2. 验证Operator安装
            kubectl get pods -n milvus-operator
            
            # 3. 创建Milvus集群(milvus-cluster.yaml)
            cat > milvus-cluster.yaml <<EOF
            apiVersion: milvus.io/v1beta1
            kind: Milvus
            metadata:
              name: my-milvus
              namespace: default
            spec:
              mode: cluster
              dependencies:
                etcd:
                  inCluster:
                    deletionPolicy: Delete
                    pvcDeletion: true
                storage:
                  inCluster:
                    deletionPolicy: Delete
                    pvcDeletion: true
                pulsar:
                  inCluster:
                    deletionPolicy: Delete
                    pvcDeletion: true
              components:
                proxy:
                  replicas: 2
                  resources:
                    limits:
                      cpu: 2
                      memory: 4Gi
                queryNode:
                  replicas: 3
                  resources:
                    limits:
                      cpu: 4
                      memory: 8Gi
                dataNode:
                  replicas: 2
                indexNode:
                  replicas: 1
              config:
                minio:
                  bucketName: milvus-bucket
            EOF
            
            # 4. 部署集群
            kubectl apply -f milvus-cluster.yaml
            
            # 5. 查看集群状态
            kubectl get milvus my-milvus -o yaml
            
            # 6. 扩展Query Node
            kubectl patch milvus my-milvus --type='json' -p='[{"op": "replace", "path": "/spec/components/queryNode/replicas", "value": 5}]'
            
            # 7. 查看所有资源
            kubectl get all -l app.kubernetes.io/instance=my-milvus
            
            # 8. 删除集群
            kubectl delete milvus my-milvus
            ---

03.本地开发
    a.Python环境
        a.功能说明
            使用Milvus Lite可以在本地Python环境中快速启动Milvus,无需Docker或Kubernetes。Milvus Lite是轻量级版本,适合开发、测试和原型验证。支持大部分核心功能,与完整版API兼容。数据存储在本地文件系统,便于调试。资源占用小,可以在笔记本电脑上运行。
        b.代码示例
            ---
            # 1. 安装Milvus Lite
            pip install milvus
            
            # 2. 启动Milvus Lite
            from milvus import default_server
            
            # 启动本地服务器
            default_server.start()
            
            # 3. 连接并使用
            from pymilvus import connections, Collection, CollectionSchema, FieldSchema, DataType
            
            connections.connect(
                alias="default",
                host='127.0.0.1',
                port=default_server.listen_port
            )
            
            # 4. 创建collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection(name="dev_test", schema=schema)
            
            # 5. 插入数据
            import numpy as np
            data = [
                [[np.random.random() for _ in range(128)] for _ in range(100)]
            ]
            collection.insert(data)
            
            # 6. 创建索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            collection.create_index("embedding", index_params)
            
            # 7. 查询
            collection.load()
            results = collection.search(
                data=[[np.random.random() for _ in range(128)]],
                anns_field="embedding",
                param={"nprobe": 10},
                limit=5
            )
            
            # 8. 停止服务器
            default_server.stop()
            
            # 9. 清理数据
            default_server.cleanup()
            ---
    b.开发工具
        a.功能说明
            Milvus提供多种开发工具提升开发效率。Attu是官方GUI工具,提供可视化的collection管理、数据浏览和查询功能。Milvus CLI是命令行工具,支持交互式操作和脚本自动化。Birdwatcher是调试工具,可以查看内部状态和元数据。这些工具帮助开发者快速理解和调试Milvus。
        b.代码示例
            ---
            # 1. 安装Attu(Web GUI)
            docker run -p 8000:3000 -e MILVUS_URL=localhost:19530 zilliz/attu:latest
            
            # 访问 http://localhost:8000
            # 功能:
            # - 可视化collection管理
            # - 数据浏览和编辑
            # - 向量搜索测试
            # - 索引管理
            # - 系统监控
            
            # 2. 安装Milvus CLI
            pip install milvus-cli
            
            # 启动CLI
            milvus_cli
            
            # CLI命令示例:
            # connect -h localhost -p 19530
            # list collections
            # describe collection -c my_collection
            # show index -c my_collection
            # query -c my_collection -f "id > 100" -o id,vector
            # search -c my_collection -v "[0.1, 0.2, ...]" -l 10
            
            # 3. 使用Birdwatcher(调试工具)
            # docker run -it --rm --network host milvusdb/birdwatcher:latest
            
            # Birdwatcher命令:
            # connect --etcd localhost:2379
            # show collections
            # show segments
            # show segment-info --segment-id 12345
            # show channel-watch
            
            # 4. Python调试技巧
            from pymilvus import connections, utility
            
            connections.connect("default", host="localhost", port="19530")
            
            # 查看所有collection
            collections = utility.list_collections()
            print(f"Collections: {collections}")
            
            # 查看collection详情
            from pymilvus import Collection
            collection = Collection("my_collection")
            print(f"Schema: {collection.schema}")
            print(f"Entities: {collection.num_entities}")
            print(f"Indexes: {collection.indexes}")
            
            # 查看segment信息
            segments = utility.get_query_segment_info("my_collection")
            for seg in segments:
                print(f"Segment {seg.segmentID}: {seg.num_rows} rows, state={seg.state}")
            
            # 启用日志调试
            import logging
            logging.basicConfig(level=logging.DEBUG)
            logger = logging.getLogger("pymilvus")
            logger.setLevel(logging.DEBUG)
            ---

2.2 连接数据库

01.连接配置
    a.基本连接
        a.功能说明
            使用PyMilvus SDK连接Milvus服务器需要指定主机地址和端口。默认端口为19530(gRPC)。连接建立后会创建一个全局连接对象,后续操作都基于此连接。支持多个连接别名,可以同时连接多个Milvus实例。连接对象是线程安全的,可以在多线程环境中使用。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 基本连接
            connections.connect(
                alias="default",  # 连接别名
                host="localhost",
                port="19530"
            )
            
            # 验证连接
            from pymilvus import utility
            print(f"服务器版本: {utility.get_server_version()}")
            
            # 多连接示例
            connections.connect(
                alias="cluster1",
                host="milvus-cluster1.example.com",
                port="19530"
            )
            
            connections.connect(
                alias="cluster2",
                host="milvus-cluster2.example.com",
                port="19530"
            )
            
            # 使用指定连接
            from pymilvus import Collection
            collection1 = Collection("test", using="cluster1")
            collection2 = Collection("test", using="cluster2")
            ---
    b.认证连接
        a.功能说明
            Milvus支持用户名密码认证,保护数据安全。启用认证后,所有连接都需要提供有效的凭证。支持创建多个用户并分配不同的权限。认证信息在连接建立时验证,后续操作会自动携带认证令牌。建议在生产环境中启用认证功能。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 使用用户名密码连接
            connections.connect(
                alias="default",
                host="localhost",
                port="19530",
                user="username",
                password="password"
            )
            
            # 创建新用户(需要root权限)
            from pymilvus import utility
            
            utility.create_user(
                user="new_user",
                password="secure_password",
                using="default"
            )
            
            # 修改密码
            utility.reset_password(
                user="new_user",
                old_password="secure_password",
                new_password="new_secure_password",
                using="default"
            )
            
            # 列出所有用户
            users = utility.list_usernames(using="default")
            print(f"用户列表: {users}")
            
            # 删除用户
            utility.delete_user(user="new_user", using="default")
            ---

02.连接池管理
    a.连接池配置
        a.功能说明
            PyMilvus内部使用连接池管理gRPC连接,提高并发性能。连接池会自动管理连接的创建、复用和销毁。可以配置连接池大小、超时时间等参数。连接池支持自动重连机制,网络故障恢复后会自动重建连接。合理配置连接池可以显著提升高并发场景下的性能。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 配置连接池参数
            connections.connect(
                alias="default",
                host="localhost",
                port="19530",
                pool_size=10,  # 连接池大小
                timeout=30,  # 连接超时(秒)
                wait_for_ready=True,  # 等待服务就绪
                _secure=False,  # 是否使用TLS
                _server_pem_path=None,  # TLS证书路径
                _server_name=None  # TLS服务器名称
            )
            
            # 查看连接信息
            connections.list_connections()
            
            # 获取连接详情
            conn_info = connections.get_connection_addr("default")
            print(f"连接信息: {conn_info}")
            
            # 并发测试连接池
            import concurrent.futures
            from pymilvus import Collection
            
            def query_task(task_id):
                collection = Collection("test")
                results = collection.query(
                    expr="id > 0",
                    limit=10,
                    output_fields=["id"]
                )
                return len(results)
            
            # 100个并发查询
            with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
                futures = [executor.submit(query_task, i) for i in range(100)]
                results = [f.result() for f in futures]
            print(f"完成 {len(results)} 个并发查询")
            ---
    b.连接管理
        a.功能说明
            连接对象支持显式断开和重连操作。断开连接会释放服务器端资源,但不会影响已加载的collection。应用退出前应该主动断开连接。支持检查连接状态,判断连接是否有效。可以通过别名管理多个连接,在不同连接间切换。
        b.代码示例
            ---
            from pymilvus import connections, utility
            
            # 检查连接状态
            has_connection = connections.has_connection("default")
            print(f"连接存在: {has_connection}")
            
            # 断开连接
            connections.disconnect("default")
            
            # 重新连接
            connections.connect(
                alias="default",
                host="localhost",
                port="19530"
            )
            
            # 断开所有连接
            for alias in connections.list_connections():
                connections.disconnect(alias[0])
            
            # 连接健康检查
            try:
                version = utility.get_server_version()
                print(f"连接正常,服务器版本: {version}")
            except Exception as e:
                print(f"连接异常: {e}")
                # 尝试重连
                connections.disconnect("default")
                connections.connect(
                    alias="default",
                    host="localhost",
                    port="19530"
                )
            
            # 上下文管理器(自动断开)
            class MilvusConnection:
                def __init__(self, alias, host, port):
                    self.alias = alias
                    self.host = host
                    self.port = port
                
                def __enter__(self):
                    connections.connect(
                        alias=self.alias,
                        host=self.host,
                        port=self.port
                    )
                    return self
                
                def __exit__(self, exc_type, exc_val, exc_tb):
                    connections.disconnect(self.alias)
            
            # 使用上下文管理器
            with MilvusConnection("temp", "localhost", "19530"):
                print(f"版本: {utility.get_server_version()}")
            # 自动断开连接
            ---

03.高级配置
    a.TLS加密
        a.功能说明
            Milvus支持TLS加密传输,保护数据在网络传输过程中的安全。需要配置服务器端证书和客户端证书。启用TLS后,所有通信都会加密,防止中间人攻击。适合在公网环境或对安全要求高的场景使用。TLS会增加一定的性能开销,但提供了更高的安全性。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 使用TLS连接
            connections.connect(
                alias="secure",
                host="milvus.example.com",
                port="19530",
                secure=True,  # 启用TLS
                server_pem_path="/path/to/server.pem",  # 服务器证书
                server_name="milvus.example.com",  # 服务器名称(用于证书验证)
                user="username",
                password="password"
            )
            
            # 双向TLS认证(客户端证书)
            connections.connect(
                alias="mutual_tls",
                host="milvus.example.com",
                port="19530",
                secure=True,
                server_pem_path="/path/to/server.pem",
                client_pem_path="/path/to/client.pem",  # 客户端证书
                client_key_path="/path/to/client.key",  # 客户端私钥
                ca_pem_path="/path/to/ca.pem",  # CA证书
                server_name="milvus.example.com"
            )
            
            # 服务器端TLS配置(milvus.yaml)
            # tls:
            #   serverPemPath: /path/to/server.pem
            #   serverKeyPath: /path/to/server.key
            #   caPemPath: /path/to/ca.pem
            
            # 生成自签名证书(测试用)
            # openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.pem -days 365 -nodes
            ---
    b.负载均衡
        a.功能说明
            在集群环境中,可以通过负载均衡器连接多个Proxy节点,提高可用性和吞吐量。客户端连接到负载均衡器地址,请求会自动分发到后端Proxy。支持多种负载均衡策略,如轮询、最少连接等。Proxy节点故障时,负载均衡器会自动剔除故障节点。这种架构提供了更好的容错能力和扩展性。
        b.代码示例
            ---
            from pymilvus import connections
            
            # 连接到负载均衡器
            connections.connect(
                alias="cluster",
                host="milvus-lb.example.com",  # 负载均衡器地址
                port="19530"
            )
            
            # Kubernetes环境下的负载均衡配置
            # apiVersion: v1
            # kind: Service
            # metadata:
            #   name: milvus-proxy-lb
            # spec:
            #   type: LoadBalancer
            #   selector:
            #     app: milvus-proxy
            #   ports:
            #     - protocol: TCP
            #       port: 19530
            #       targetPort: 19530
            
            # 使用DNS轮询(多个Proxy地址)
            # 配置DNS记录:
            # milvus.example.com -> 10.0.1.1
            # milvus.example.com -> 10.0.1.2
            # milvus.example.com -> 10.0.1.3
            
            connections.connect(
                alias="dns_lb",
                host="milvus.example.com",  # DNS会自动轮询
                port="19530"
            )
            
            # 客户端重试机制
            import time
            from pymilvus import connections, utility
            
            def connect_with_retry(max_retries=3, retry_delay=5):
                for attempt in range(max_retries):
                    try:
                        connections.connect(
                            alias="default",
                            host="milvus-lb.example.com",
                            port="19530",
                            timeout=10
                        )
                        version = utility.get_server_version()
                        print(f"连接成功,版本: {version}")
                        return True
                    except Exception as e:
                        print(f"连接失败 (尝试 {attempt + 1}/{max_retries}): {e}")
                        if attempt < max_retries - 1:
                            time.sleep(retry_delay)
                return False
            
            connect_with_retry()
            ---

2.3 基础操作

01.Collection操作
    a.创建Collection
        a.功能说明
            Collection是Milvus中的基本数据单元,类似于关系数据库中的表。创建Collection需要定义Schema,包括字段名称、数据类型、维度等。主键字段是必需的,可以设置为自动生成。向量字段需要指定维度,必须与后续插入的向量维度一致。创建后的Schema不可修改,需要谨慎设计。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            
            # 定义字段
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            # 创建Schema
            schema = CollectionSchema(
                fields=fields,
                description="文档向量库",
                enable_dynamic_field=False  # 是否允许动态字段
            )
            
            # 创建Collection
            collection = Collection(
                name="documents",
                schema=schema,
                using="default",
                shards_num=2  # 分片数量
            )
            
            print(f"Collection创建成功: {collection.name}")
            print(f"Schema: {collection.schema}")
            ---
    b.查看Collection
        a.功能说明
            可以列出所有Collection,查看Collection的详细信息,包括Schema定义、统计信息等。通过Collection对象可以获取实体数量、索引信息、加载状态等。这些信息有助于了解数据规模和系统状态。支持检查Collection是否存在,避免重复创建。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            
            # 列出所有Collection
            collections = utility.list_collections()
            print(f"所有Collection: {collections}")
            
            # 检查Collection是否存在
            has_collection = utility.has_collection("documents")
            print(f"Collection存在: {has_collection}")
            
            # 获取Collection对象
            collection = Collection("documents")
            
            # 查看Schema
            print(f"Schema: {collection.schema}")
            print(f"描述: {collection.description}")
            
            # 查看统计信息
            print(f"实体数量: {collection.num_entities}")
            
            # 查看索引信息
            indexes = collection.indexes
            for index in indexes:
                print(f"索引字段: {index.field_name}")
                print(f"索引类型: {index.params}")
            
            # 查看加载状态
            print(f"已加载: {utility.load_state('documents')}")
            
            # 查看Collection属性
            properties = collection.properties
            print(f"属性: {properties}")
            ---

02.数据插入
    a.批量插入
        a.功能说明
            数据插入以列式格式进行,每个字段对应一个列表。插入操作是原子的,要么全部成功要么全部失败。返回值包含插入的主键列表。建议批量插入,提高吞吐量,单次插入建议1000-10000条。插入后数据不会立即可见,需要等待刷新或自动刷新周期。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 准备数据(列式格式)
            ids = [i for i in range(1000)]
            titles = [f"文档{i}" for i in range(1000)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(1000)]
            
            # 插入数据
            data = [ids, titles, embeddings]
            insert_result = collection.insert(data)
            
            print(f"插入成功: {insert_result.insert_count} 条")
            print(f"主键列表: {insert_result.primary_keys[:10]}...")  # 显示前10个
            
            # 自动生成主键
            collection_auto = Collection("auto_id_collection")
            data_auto = [titles, embeddings]  # 不需要提供id
            insert_result = collection_auto.insert(data_auto)
            
            # 刷新数据(使数据立即可见)
            collection.flush()
            print(f"刷新后实体数量: {collection.num_entities}")
            ---
    b.单条插入
        a.功能说明
            虽然Milvus优化了批量插入,但也支持单条插入。单条插入适合实时数据流场景,每次插入一条记录。性能不如批量插入,但延迟更低。可以通过累积小批量来平衡吞吐量和延迟。建议在应用层实现缓冲机制,积累一定数量后批量插入。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 单条插入
            single_data = [
                [1001],  # id
                ["单条文档"],  # title
                [[np.random.random() for _ in range(128)]]  # embedding
            ]
            collection.insert(single_data)
            
            # 实时插入场景(带缓冲)
            class BufferedInserter:
                def __init__(self, collection, buffer_size=100):
                    self.collection = collection
                    self.buffer_size = buffer_size
                    self.buffer = {"ids": [], "titles": [], "embeddings": []}
                
                def insert(self, id, title, embedding):
                    self.buffer["ids"].append(id)
                    self.buffer["titles"].append(title)
                    self.buffer["embeddings"].append(embedding)
                    
                    if len(self.buffer["ids"]) >= self.buffer_size:
                        self.flush()
                
                def flush(self):
                    if len(self.buffer["ids"]) > 0:
                        data = [
                            self.buffer["ids"],
                            self.buffer["titles"],
                            self.buffer["embeddings"]
                        ]
                        self.collection.insert(data)
                        self.buffer = {"ids": [], "titles": [], "embeddings": []}
                        print(f"批量插入 {len(data[0])} 条数据")
            
            # 使用缓冲插入器
            inserter = BufferedInserter(collection, buffer_size=100)
            
            for i in range(250):
                inserter.insert(
                    id=2000 + i,
                    title=f"实时文档{i}",
                    embedding=[np.random.random() for _ in range(128)]
                )
            
            inserter.flush()  # 刷新剩余数据
            ---

03.数据查询
    a.主键查询
        a.功能说明
            通过主键精确查询实体,返回指定字段的值。主键查询是最快的查询方式,时间复杂度O(1)。支持批量主键查询,一次查询多个实体。可以指定返回的字段,减少数据传输量。主键查询不需要加载collection到内存,可以直接从存储层读取。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 单个主键查询
            results = collection.query(
                expr="id == 1",
                output_fields=["id", "title", "embedding"]
            )
            print(f"查询结果: {results}")
            
            # 批量主键查询
            ids_to_query = [1, 10, 100, 1000]
            results = collection.query(
                expr=f"id in {ids_to_query}",
                output_fields=["id", "title"]
            )
            for result in results:
                print(f"ID: {result['id']}, Title: {result['title']}")
            
            # 范围查询
            results = collection.query(
                expr="id > 100 and id < 200",
                output_fields=["id", "title"],
                limit=10
            )
            print(f"范围查询结果: {len(results)} 条")
            ---
    b.标量过滤
        a.功能说明
            支持对标量字段进行过滤查询,使用类SQL的表达式语法。支持比较运算符(==, !=, >, <, >=, <=)、逻辑运算符(and, or, not)、成员运算符(in, not in)。可以组合多个条件进行复杂查询。标量查询需要加载collection,或者对标量字段建立索引。查询性能取决于数据量和过滤条件的选择性。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            collection.load()  # 加载到内存
            
            # 字符串匹配
            results = collection.query(
                expr='title like "文档1%"',
                output_fields=["id", "title"],
                limit=10
            )
            
            # 多条件查询
            results = collection.query(
                expr='id > 100 and id < 500 and title like "文档%"',
                output_fields=["id", "title"]
            )
            
            # IN查询
            titles_to_find = ["文档1", "文档10", "文档100"]
            results = collection.query(
                expr=f'title in {titles_to_find}',
                output_fields=["id", "title"]
            )
            
            # 复杂表达式
            results = collection.query(
                expr='(id > 100 and id < 200) or (id > 800 and id < 900)',
                output_fields=["id", "title"],
                limit=20
            )
            
            # 分页查询
            page_size = 100
            offset = 0
            
            while True:
                results = collection.query(
                    expr="id > 0",
                    output_fields=["id", "title"],
                    limit=page_size,
                    offset=offset
                )
                
                if len(results) == 0:
                    break
                
                print(f"第 {offset // page_size + 1} 页: {len(results)} 条")
                offset += page_size
            ---

04.数据删除
    a.按表达式删除
        a.功能说明
            通过表达式删除满足条件的实体。删除操作是异步的,立即返回但数据可能不会立即删除。支持按主键、标量字段或组合条件删除。删除大量数据时建议分批进行,避免单次删除过多影响性能。删除后的空间不会立即释放,需要等待compaction操作。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 删除单条记录
            expr = "id == 1001"
            collection.delete(expr)
            
            # 批量删除
            ids_to_delete = [1, 2, 3, 4, 5]
            expr = f"id in {ids_to_delete}"
            collection.delete(expr)
            
            # 条件删除
            expr = "id > 2000 and id < 2100"
            collection.delete(expr)
            
            # 删除所有数据(慎用)
            # expr = "id > 0"
            # collection.delete(expr)
            
            # 分批删除大量数据
            batch_size = 1000
            start_id = 3000
            end_id = 10000
            
            for i in range(start_id, end_id, batch_size):
                expr = f"id >= {i} and id < {i + batch_size}"
                collection.delete(expr)
                print(f"已删除 ID {i} 到 {i + batch_size}")
            
            # 刷新删除操作
            collection.flush()
            print(f"删除后实体数量: {collection.num_entities}")
            ---
    b.Compaction压缩
        a.功能说明
            Compaction是Milvus的后台维护操作,用于合并小segment和清理已删除的数据。删除操作只是标记删除,实际空间通过compaction释放。Compaction会重组数据,提高查询性能。可以手动触发compaction,也可以等待自动执行。Compaction过程中collection仍可正常使用,但可能影响性能。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 手动触发compaction
            collection.compact()
            print("Compaction已触发")
            
            # 等待compaction完成
            while True:
                state = utility.get_compaction_state(collection.name)
                if state.state == 3:  # 3表示完成
                    print("Compaction完成")
                    break
                print(f"Compaction进行中: {state.executing_plan_no}/{state.total_plan_no}")
                time.sleep(1)
            
            # 查看compaction计划
            plans = utility.get_compaction_plans(collection.name)
            for plan in plans:
                print(f"计划ID: {plan.id}, 源segment: {plan.sources}, 目标segment: {plan.target}")
            
            # 配置自动compaction(milvus.yaml)
            # dataCoord:
            #   enableCompaction: true
            #   enableAutoCompaction: true
            #   compaction:
            #     min:
            #       interval: 60  # 最小间隔(秒)
            #     max:
            #       interval: 3600  # 最大间隔(秒)
            
            # 查看segment信息
            segments = utility.get_query_segment_info(collection.name)
            total_size = sum(seg.num_rows for seg in segments)
            print(f"总segment数: {len(segments)}, 总行数: {total_size}")
            
            for seg in segments[:5]:  # 显示前5个segment
                print(f"Segment {seg.segmentID}: {seg.num_rows} rows, state={seg.state}")
            ---

3 Collection管理

3.1 Schema定义

01.字段类型
    a.标量字段
        a.功能说明
            Milvus支持多种标量数据类型,包括整数(INT8, INT16, INT32, INT64)、浮点数(FLOAT, DOUBLE)、布尔值(BOOL)、字符串(VARCHAR)和JSON。标量字段用于存储元数据和过滤条件。VARCHAR类型需要指定最大长度。JSON类型支持嵌套结构,可以存储复杂的元数据。标量字段可以建立索引,加速过滤查询。
        b.代码示例
            ---
            from pymilvus import FieldSchema, DataType
            
            # 整数类型
            id_field = FieldSchema(
                name="id",
                dtype=DataType.INT64,
                is_primary=True,
                auto_id=False
            )
            
            age_field = FieldSchema(
                name="age",
                dtype=DataType.INT32
            )
            
            # 浮点数类型
            score_field = FieldSchema(
                name="score",
                dtype=DataType.FLOAT
            )
            
            # 布尔类型
            active_field = FieldSchema(
                name="is_active",
                dtype=DataType.BOOL
            )
            
            # 字符串类型
            title_field = FieldSchema(
                name="title",
                dtype=DataType.VARCHAR,
                max_length=500
            )
            
            # JSON类型
            metadata_field = FieldSchema(
                name="metadata",
                dtype=DataType.JSON
            )
            
            # 所有标量类型示例
            fields = [
                id_field,
                age_field,
                score_field,
                active_field,
                title_field,
                metadata_field
            ]
            ---
    b.向量字段
        a.功能说明
            向量字段存储高维向量数据,是Milvus的核心字段类型。支持FLOAT_VECTOR(浮点向量)、BINARY_VECTOR(二值向量)和FLOAT16_VECTOR(半精度向量)。必须指定向量维度,维度在创建后不可修改。一个collection可以包含多个向量字段,支持多模态检索。向量字段必须建立索引才能进行相似度搜索。
        b.代码示例
            ---
            from pymilvus import FieldSchema, DataType
            
            # 浮点向量(最常用)
            embedding_field = FieldSchema(
                name="embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=128  # 向量维度
            )
            
            # 高维向量
            high_dim_field = FieldSchema(
                name="high_dim_embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=1536  # OpenAI ada-002维度
            )
            
            # 二值向量(节省存储空间)
            binary_field = FieldSchema(
                name="binary_embedding",
                dtype=DataType.BINARY_VECTOR,
                dim=512  # 维度必须是8的倍数
            )
            
            # 半精度向量(节省内存)
            fp16_field = FieldSchema(
                name="fp16_embedding",
                dtype=DataType.FLOAT16_VECTOR,
                dim=256
            )
            
            # 多向量字段(多模态)
            text_vector = FieldSchema(
                name="text_embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=768
            )
            
            image_vector = FieldSchema(
                name="image_embedding",
                dtype=DataType.FLOAT_VECTOR,
                dim=512
            )
            
            # 向量字段集合
            vector_fields = [
                embedding_field,
                high_dim_field,
                binary_field,
                fp16_field,
                text_vector,
                image_vector
            ]
            ---

02.Schema配置
    a.基本Schema
        a.功能说明
            Schema定义了collection的结构,包括所有字段的定义。必须包含一个主键字段,可以设置为自动生成。可以添加描述信息,便于理解collection用途。Schema创建后不可修改,需要谨慎设计。建议在设计阶段充分考虑业务需求和扩展性。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            # 定义字段
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=5000),
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),
                FieldSchema(name="timestamp", dtype=DataType.INT64),
                FieldSchema(name="score", dtype=DataType.FLOAT),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768)
            ]
            
            # 创建Schema
            schema = CollectionSchema(
                fields=fields,
                description="文档搜索系统",
                enable_dynamic_field=False
            )
            
            # 查看Schema信息
            print(f"字段数量: {len(schema.fields)}")
            for field in schema.fields:
                print(f"字段: {field.name}, 类型: {field.dtype}, 主键: {field.is_primary}")
            
            # Schema验证
            print(f"主键字段: {schema.primary_field.name}")
            print(f"自动ID: {schema.auto_id}")
            ---
    b.动态Schema
        a.功能说明
            动态Schema允许插入未在Schema中定义的字段,提供更大的灵活性。动态字段会自动推断类型,存储在内部的JSON字段中。适合元数据结构不固定的场景,如用户自定义属性。动态字段可以用于过滤查询,但性能不如预定义字段。启用动态Schema会增加一定的存储开销。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
            
            # 启用动态Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(
                fields=fields,
                description="动态Schema示例",
                enable_dynamic_field=True  # 启用动态字段
            )
            
            collection = Collection("dynamic_collection", schema=schema)
            
            # 插入数据(包含动态字段)
            data = [
                [1, 2, 3],  # id
                [[0.1]*128, [0.2]*128, [0.3]*128],  # embedding
                ["标题1", "标题2", "标题3"],  # 动态字段: title
                [100, 200, 300],  # 动态字段: score
                [{"tag": "AI"}, {"tag": "ML"}, {"tag": "DL"}]  # 动态字段: metadata
            ]
            
            # 注意:动态字段需要在插入时指定字段名
            collection.insert(data, fields=["id", "embedding", "title", "score", "metadata"])
            
            # 查询动态字段
            collection.load()
            results = collection.query(
                expr="id > 0",
                output_fields=["id", "title", "score", "metadata"]
            )
            
            for result in results:
                print(f"ID: {result['id']}, Title: {result.get('title')}, Score: {result.get('score')}")
            ---

03.主键设计
    a.自增主键
        a.功能说明
            自增主键由Milvus自动生成,保证全局唯一。使用雪花算法生成64位整数ID,包含时间戳和节点信息。自增主键简化了数据插入流程,无需应用层维护ID。适合不需要自定义ID的场景。自增ID是递增的,但不保证连续。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
            import numpy as np
            
            # 定义自增主键Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(fields=fields, description="自增ID示例")
            collection = Collection("auto_id_collection", schema=schema)
            
            # 插入数据(不需要提供id)
            texts = [f"文本{i}" for i in range(100)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(100)]
            
            data = [texts, embeddings]  # 注意:没有id字段
            insert_result = collection.insert(data)
            
            # 获取自动生成的ID
            generated_ids = insert_result.primary_keys
            print(f"生成的ID: {generated_ids[:10]}")
            
            # 使用生成的ID查询
            results = collection.query(
                expr=f"id in {generated_ids[:5]}",
                output_fields=["id", "text"]
            )
            
            for result in results:
                print(f"ID: {result['id']}, Text: {result['text']}")
            ---
    b.自定义主键
        a.功能说明
            自定义主键由应用层提供,可以使用业务ID或UUID。需要保证主键的全局唯一性,重复插入会报错。自定义主键便于与现有系统集成,可以直接使用业务ID查询。支持INT64和VARCHAR类型的主键。VARCHAR主键最大长度为65535字符。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
            import uuid
            import numpy as np
            
            # INT64自定义主键
            fields_int = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema_int = CollectionSchema(fields=fields_int, description="INT64主键")
            collection_int = Collection("custom_int_id", schema=schema_int)
            
            # 插入数据(提供自定义ID)
            ids = [1000 + i for i in range(100)]
            texts = [f"文本{i}" for i in range(100)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(100)]
            
            data = [ids, texts, embeddings]
            collection_int.insert(data)
            
            # VARCHAR主键(UUID)
            fields_str = [
                FieldSchema(name="id", dtype=DataType.VARCHAR, max_length=36, is_primary=True, auto_id=False),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema_str = CollectionSchema(fields=fields_str, description="VARCHAR主键")
            collection_str = Collection("custom_str_id", schema=schema_str)
            
            # 使用UUID作为主键
            uuids = [str(uuid.uuid4()) for _ in range(100)]
            data = [uuids, texts, embeddings]
            collection_str.insert(data)
            
            # 使用UUID查询
            results = collection_str.query(
                expr=f'id == "{uuids[0]}"',
                output_fields=["id", "text"]
            )
            print(f"UUID查询结果: {results[0]}")
            
            # 业务ID示例(如订单号)
            order_ids = [f"ORDER{i:08d}" for i in range(100)]
            data = [order_ids, texts, embeddings]
            collection_str.insert(data)
            ---

04.Schema最佳实践
    a.字段选择
        a.功能说明
            合理选择字段类型可以优化存储和性能。只包含必要的字段,避免冗余数据。VARCHAR字段设置合理的最大长度,过大会浪费存储空间。对于高频过滤的字段,建议建立标量索引。JSON字段适合存储非结构化元数据,但查询性能不如预定义字段。
        b.代码示例
            ---
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            # 优化前:字段过多,类型不合理
            fields_bad = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=10000),  # 过大
                FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=50000),  # 过大
                FieldSchema(name="author", dtype=DataType.VARCHAR, max_length=5000),  # 过大
                FieldSchema(name="tags", dtype=DataType.VARCHAR, max_length=10000),  # 应该用JSON
                FieldSchema(name="metadata", dtype=DataType.VARCHAR, max_length=10000),  # 应该用JSON
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            # 优化后:字段精简,类型合理
            fields_good = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),  # 合理长度
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),  # 用于过滤
                FieldSchema(name="timestamp", dtype=DataType.INT64),  # 时间戳(便于范围查询)
                FieldSchema(name="metadata", dtype=DataType.JSON),  # 灵活的元数据
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema_good = CollectionSchema(
                fields=fields_good,
                description="优化的Schema设计"
            )
            
            # 字段索引策略
            # 1. 主键自动索引
            # 2. 向量字段必须建索引
            # 3. 高频过滤字段建标量索引
            # 4. JSON字段不建索引(性能考虑)
            ---
    b.版本管理
        a.功能说明
            Schema一旦创建就不可修改,需要做好版本管理。可以通过collection名称包含版本号来管理不同版本。数据迁移时,创建新collection并逐步迁移数据。使用别名机制,应用层无需感知collection变化。建议在开发阶段充分测试Schema设计,避免频繁变更。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            
            # Schema版本管理策略
            
            # V1版本
            fields_v1 = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema_v1 = CollectionSchema(fields=fields_v1, description="V1版本")
            collection_v1 = Collection("documents_v1", schema=schema_v1)
            
            # V2版本(增加字段)
            fields_v2 = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),  # 新增
                FieldSchema(name="timestamp", dtype=DataType.INT64),  # 新增
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=256)  # 维度变化
            ]
            schema_v2 = CollectionSchema(fields=fields_v2, description="V2版本")
            collection_v2 = Collection("documents_v2", schema=schema_v2)
            
            # 数据迁移函数
            def migrate_data(source_collection, target_collection, batch_size=1000):
                source_collection.load()
                offset = 0
                
                while True:
                    # 从源collection读取数据
                    results = source_collection.query(
                        expr="id > 0",
                        output_fields=["id", "text", "embedding"],
                        limit=batch_size,
                        offset=offset
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 转换数据格式
                    ids = [r["id"] for r in results]
                    texts = [r["text"] for r in results]
                    # 假设有函数将128维向量升级到256维
                    embeddings = [upgrade_embedding(r["embedding"]) for r in results]
                    # 填充新字段
                    categories = ["default"] * len(results)
                    timestamps = [int(time.time())] * len(results)
                    
                    # 插入到目标collection
                    data = [ids, texts, categories, timestamps, embeddings]
                    target_collection.insert(data)
                    
                    offset += batch_size
                    print(f"已迁移 {offset} 条数据")
                
                target_collection.flush()
            
            # 使用别名进行平滑切换
            utility.create_alias(collection_name="documents_v1", alias="documents")
            
            # 迁移完成后切换别名
            # utility.alter_alias(collection_name="documents_v2", alias="documents")
            
            # 应用层代码不变
            collection = Collection("documents")  # 通过别名访问
            ---

3.2 创建Collection

01.Collection创建方法
    a.基本创建
        a.功能说明
            创建Collection需要提供名称和Schema定义。Collection名称必须唯一,不能与已存在的collection重复。可以指定分片数量,影响并行查询性能。创建后立即返回Collection对象,但不会自动加载到内存。建议在创建后立即创建索引,避免后续数据插入时的性能问题。Collection名称支持字母、数字和下划线,长度不超过255字符。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            
            # 定义Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields, description="文档集合")
            
            # 创建Collection
            collection = Collection(
                name="documents",
                schema=schema,
                using="default",
                shards_num=2
            )
            
            print(f"Collection创建成功: {collection.name}")
            print(f"分片数量: {collection.shards_num}")
            print(f"Schema: {collection.schema}")
            
            # 验证创建
            from pymilvus import utility
            assert utility.has_collection("documents")
            ---
    b.从已有Collection创建
        a.功能说明
            可以通过Collection名称获取已存在的collection对象。这种方式不会重新创建collection,只是获取引用。适合在不同模块或进程中访问同一个collection。如果collection不存在会抛出异常,可以先检查是否存在。获取的Collection对象与原对象共享相同的元数据和数据。多个Collection对象可以指向同一个collection,修改会互相影响。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            # 检查Collection是否存在
            if utility.has_collection("documents"):
                # 获取已存在的Collection
                collection = Collection("documents")
                print(f"获取Collection: {collection.name}")
                print(f"实体数量: {collection.num_entities}")
                print(f"Schema: {collection.schema}")
            else:
                print("Collection不存在")
            
            # 安全获取Collection
            def get_or_create_collection(name, schema, shards_num=2):
                if utility.has_collection(name):
                    return Collection(name)
                else:
                    return Collection(name, schema=schema, shards_num=shards_num)
            
            collection = get_or_create_collection("documents", schema)
            
            # 多个引用示例
            collection1 = Collection("documents")
            collection2 = Collection("documents")
            
            # 两个对象指向同一个collection
            print(f"相同collection: {collection1.name == collection2.name}")
            ---

02.Collection配置
    a.分片配置
        a.功能说明
            分片数量决定了数据的分布和并行度。更多分片可以提高查询并发性能,但也会增加管理开销。建议根据数据量和查询负载设置分片数。单机环境建议1-2个分片,集群环境可以设置更多。分片数量在创建后不可修改,需要谨慎选择。每个分片会独立管理一部分数据,查询时会并行处理所有分片。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema
            
            # 单分片(小数据量,<100万)
            collection_small = Collection(
                name="small_collection",
                schema=schema,
                shards_num=1
            )
            
            # 多分片(大数据量,>1000万)
            collection_large = Collection(
                name="large_collection",
                schema=schema,
                shards_num=4
            )
            
            # 根据数据量动态选择分片数
            def calculate_shards(estimated_entities):
                if estimated_entities < 1000000:
                    return 1
                elif estimated_entities < 10000000:
                    return 2
                elif estimated_entities < 100000000:
                    return 4
                else:
                    return 8
            
            shards = calculate_shards(5000000)
            collection = Collection(
                name="dynamic_shards",
                schema=schema,
                shards_num=shards
            )
            
            print(f"数据量: 5000000, 分片数: {shards}")
            
            # 查看分片信息
            print(f"Collection分片数: {collection.shards_num}")
            ---
    b.属性配置
        a.功能说明
            Collection支持设置多种属性,如TTL(数据过期时间)、副本数量等。TTL可以自动清理过期数据,适合时效性数据。副本数量影响查询性能和可用性,更多副本可以提高查询吞吐量。属性可以在创建后修改,提供灵活的配置能力。TTL以秒为单位,0表示永不过期。副本数量建议设置为2-3,过多会增加存储开销。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 设置TTL(秒)
            collection.set_properties(properties={"collection.ttl.seconds": 86400})  # 1天
            print("TTL设置为1天")
            
            # 设置副本数量
            collection.set_properties(properties={"collection.replica.number": 2})
            print("副本数量设置为2")
            
            # 查看属性
            properties = collection.properties
            print(f"Collection属性: {properties}")
            
            # 批量设置属性
            collection.set_properties(properties={
                "collection.ttl.seconds": 172800,  # 2天
                "collection.replica.number": 3
            })
            
            # 删除TTL(永不过期)
            collection.set_properties(properties={"collection.ttl.seconds": 0})
            print("TTL已禁用")
            
            # 常用属性配置
            # 1. 缓存数据(短期)
            cache_collection = Collection("cache")
            cache_collection.set_properties(properties={"collection.ttl.seconds": 3600})  # 1小时
            
            # 2. 日志数据(中期)
            log_collection = Collection("logs")
            log_collection.set_properties(properties={"collection.ttl.seconds": 604800})  # 7天
            
            # 3. 持久数据(长期)
            persistent_collection = Collection("persistent")
            persistent_collection.set_properties(properties={"collection.ttl.seconds": 0})  # 永久
            ---

03.别名管理
    a.创建别名
        a.功能说明
            别名是collection的另一个名称,可以用于平滑升级和版本管理。一个collection可以有多个别名,一个别名只能指向一个collection。通过别名访问collection,应用层无需感知实际的collection名称。适合在数据迁移或Schema变更时使用。别名操作是原子的,切换过程中不会影响服务。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            
            # 创建别名
            utility.create_alias(
                collection_name="documents_v1",
                alias="documents"
            )
            print("别名创建成功")
            
            # 通过别名访问
            collection = Collection("documents")  # 实际访问documents_v1
            print(f"实际collection: {collection.name}")
            
            # 查看别名列表
            aliases = utility.list_aliases("documents_v1")
            print(f"别名列表: {aliases}")
            
            # 一个collection多个别名
            utility.create_alias("documents_v1", "docs")
            utility.create_alias("documents_v1", "doc_collection")
            
            # 所有别名都指向同一个collection
            col1 = Collection("documents")
            col2 = Collection("docs")
            col3 = Collection("doc_collection")
            
            print(f"实体数量一致: {col1.num_entities == col2.num_entities == col3.num_entities}")
            ---
    b.切换别名
        a.功能说明
            可以将别名切换到另一个collection,实现平滑升级。切换操作是原子的,不会出现中间状态。适合在新旧版本切换时使用,应用层无需修改代码。切换前建议先验证新collection的数据完整性。可以通过别名实现蓝绿部署和灰度发布。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            
            # 初始状态:别名指向v1
            utility.create_alias("documents_v1", "documents")
            
            # 创建新版本collection
            collection_v2 = Collection("documents_v2", schema=new_schema)
            # ... 迁移数据到v2 ...
            
            # 切换别名到v2
            utility.alter_alias(
                collection_name="documents_v2",
                alias="documents"
            )
            print("别名已切换到v2")
            
            # 现在通过别名访问的是v2
            collection = Collection("documents")
            print(f"当前版本: {collection.name}")
            
            # 蓝绿部署示例
            def blue_green_deployment(old_collection, new_collection, alias):
                # 1. 验证新collection
                new_col = Collection(new_collection)
                assert new_col.num_entities > 0, "新collection数据为空"
                
                # 2. 切换别名
                utility.alter_alias(
                    collection_name=new_collection,
                    alias=alias
                )
                print(f"已切换到新版本: {new_collection}")
                
                # 3. 保留旧版本一段时间,以便回滚
                # 如果需要回滚
                # utility.alter_alias(collection_name=old_collection, alias=alias)
            
            blue_green_deployment("documents_v1", "documents_v2", "documents")
            
            # 删除别名
            utility.drop_alias("documents")
            print("别名已删除")
            ---

04.Collection元数据
    a.查看元数据
        a.功能说明
            Collection包含丰富的元数据信息,包括Schema定义、统计信息、索引信息等。通过元数据可以了解collection的结构和状态。元数据查询不需要加载collection,性能开销小。可以用于监控和管理collection。元数据会实时更新,反映collection的最新状态。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            collection = Collection("documents")
            
            # Schema信息
            print(f"Collection名称: {collection.name}")
            print(f"描述: {collection.description}")
            print(f"Schema: {collection.schema}")
            
            # 字段信息
            for field in collection.schema.fields:
                print(f"字段: {field.name}")
                print(f"  类型: {field.dtype}")
                print(f"  主键: {field.is_primary}")
                if field.dtype == DataType.FLOAT_VECTOR:
                    print(f"  维度: {field.params.get('dim')}")
                if field.dtype == DataType.VARCHAR:
                    print(f"  最大长度: {field.params.get('max_length')}")
            
            # 统计信息
            print(f"实体数量: {collection.num_entities}")
            print(f"分片数量: {collection.shards_num}")
            
            # 索引信息
            indexes = collection.indexes
            for index in indexes:
                print(f"索引字段: {index.field_name}")
                print(f"索引参数: {index.params}")
            
            # 加载状态
            load_state = utility.load_state("documents")
            print(f"加载状态: {load_state}")
            
            # 属性信息
            properties = collection.properties
            print(f"属性: {properties}")
            ---
    b.监控统计
        a.功能说明
            可以通过元数据监控collection的使用情况和性能指标。统计信息包括实体数量、segment信息、内存占用等。定期监控可以及时发现问题,如数据倾斜、内存不足等。可以基于统计信息进行容量规划和性能优化。Milvus提供了丰富的监控API和指标。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 监控函数
            def monitor_collection(collection_name, interval=60):
                while True:
                    collection = Collection(collection_name)
                    
                    # 基本统计
                    print(f"\\n=== {time.strftime('%Y-%m-%d %H:%M:%S')} ===")
                    print(f"实体数量: {collection.num_entities:,}")
                    
                    # Segment信息
                    segments = utility.get_query_segment_info(collection_name)
                    print(f"Segment数量: {len(segments)}")
                    
                    total_rows = sum(seg.num_rows for seg in segments)
                    print(f"总行数: {total_rows:,}")
                    
                    # 按状态分组
                    state_counts = {}
                    for seg in segments:
                        state = seg.state
                        state_counts[state] = state_counts.get(state, 0) + 1
                    print(f"Segment状态: {state_counts}")
                    
                    # 内存占用(需要collection已加载)
                    if utility.load_state(collection_name) == utility.LoadState.Loaded:
                        # 估算内存占用
                        vector_dim = 128
                        vector_size = total_rows * vector_dim * 4  # float32
                        print(f"估算向量内存: {vector_size / 1024 / 1024:.2f} MB")
                    
                    time.sleep(interval)
            
            # 启动监控(在后台线程中运行)
            import threading
            monitor_thread = threading.Thread(
                target=monitor_collection,
                args=("documents", 60),
                daemon=True
            )
            monitor_thread.start()
            
            # 性能指标收集
            def collect_metrics(collection_name):
                collection = Collection(collection_name)
                
                metrics = {
                    "name": collection_name,
                    "entities": collection.num_entities,
                    "shards": collection.shards_num,
                    "load_state": str(utility.load_state(collection_name)),
                    "timestamp": time.time()
                }
                
                # 添加segment信息
                segments = utility.get_query_segment_info(collection_name)
                metrics["segments"] = len(segments)
                metrics["total_rows"] = sum(seg.num_rows for seg in segments)
                
                return metrics
            
            metrics = collect_metrics("documents")
            print(f"指标: {metrics}")
            ---

3.3 加载和释放

01.加载Collection
    a.加载到内存
        a.功能说明
            Collection创建后默认不加载到内存,需要显式调用load方法。加载后数据和索引会被加载到Query Node的内存中,才能进行搜索查询。加载是异步操作,可以通过load_state查看加载进度。大型collection加载可能需要较长时间,建议在低峰期进行。加载后会占用内存资源,需要根据服务器配置合理规划。加载过程会读取所有segment和索引文件,网络和磁盘IO是主要瓶颈。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 加载Collection
            print("开始加载Collection...")
            collection.load()
            
            # 等待加载完成
            while True:
                state = utility.load_state("documents")
                if state == utility.LoadState.Loaded:
                    print("加载完成")
                    break
                elif state == utility.LoadState.Loading:
                    print("加载中...")
                    time.sleep(1)
                elif state == utility.LoadState.NotLoad:
                    print("未加载")
                    break
                else:
                    print(f"加载状态: {state}")
                    break
            
            # 查看加载状态
            print(f"当前状态: {utility.load_state('documents')}")
            
            # 加载时指定副本数量
            collection.load(replica_number=2)
            print("已加载2个副本")
            
            # 加载进度监控
            def monitor_load_progress(collection_name, check_interval=1):
                start_time = time.time()
                
                while True:
                    state = utility.load_state(collection_name)
                    elapsed = time.time() - start_time
                    
                    if state == utility.LoadState.Loaded:
                        print(f"加载完成,耗时: {elapsed:.2f}秒")
                        break
                    elif state == utility.LoadState.Loading:
                        print(f"加载中... 已耗时: {elapsed:.2f}秒")
                        time.sleep(check_interval)
                    else:
                        print(f"加载异常: {state}")
                        break
            
            monitor_load_progress("documents")
            ---
    b.分区加载
        a.功能说明
            可以只加载部分分区到内存,节省资源。适合数据按时间或类别分区的场景,只加载热数据分区。分区加载可以显著减少内存占用,提高加载速度。查询时只能查询已加载的分区,未加载分区的数据不可见。可以动态加载和释放分区,实现冷热数据分离。分区加载特别适合时间序列数据,如日志、监控数据等。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            
            collection = Collection("documents")
            
            # 创建分区
            partition_2024 = Partition(collection, "2024")
            partition_2025 = Partition(collection, "2025")
            partition_2026 = Partition(collection, "2026")
            
            # 只加载2026分区(最新数据)
            partition_2026.load()
            print("已加载2026分区")
            
            # 查询只在已加载分区中进行
            results = collection.search(
                data=[[0.1]*128],
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 10}},
                limit=10,
                partition_names=["2026"]
            )
            print(f"搜索结果: {len(results[0])} 条")
            
            # 加载多个分区
            collection.load(partition_names=["2025", "2026"])
            print("已加载2025和2026分区")
            
            # 动态分区管理
            def load_recent_partitions(collection, months=3):
                from datetime import datetime, timedelta
                
                # 计算需要加载的分区
                current_date = datetime.now()
                partitions_to_load = []
                
                for i in range(months):
                    date = current_date - timedelta(days=30*i)
                    partition_name = date.strftime("%Y%m")
                    partitions_to_load.append(partition_name)
                
                # 加载分区
                collection.load(partition_names=partitions_to_load)
                print(f"已加载最近{months}个月的分区: {partitions_to_load}")
            
            load_recent_partitions(collection, months=3)
            
            # 释放特定分区
            partition_2024.release()
            print("已释放2024分区")
            
            # 查看分区加载状态
            for partition in collection.partitions:
                state = utility.load_state("documents", partition.name)
                print(f"分区 {partition.name}: {state}")
            ---

02.释放Collection
    a.释放内存
        a.功能说明
            释放操作会将collection从内存中卸载,释放Query Node的内存资源。释放后无法进行搜索查询,但数据仍然保存在存储层。适合临时使用的collection或需要释放内存的场景。释放是异步操作,立即返回但可能需要时间完成。释放后可以重新加载,不影响数据完整性。释放操作不会删除数据,只是从内存中移除。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            
            # 释放Collection
            print("开始释放Collection...")
            collection.release()
            
            # 等待释放完成
            time.sleep(1)
            state = utility.load_state("documents")
            print(f"释放后状态: {state}")
            
            # 验证释放
            assert state == utility.LoadState.NotLoad, "释放失败"
            
            # 释放特定分区
            from pymilvus import Partition
            partition = Partition(collection, "2024")
            partition.release()
            print("已释放2024分区")
            
            # 释放所有分区
            collection.release()
            print("已释放所有分区")
            
            # 重新加载
            collection.load()
            print("已重新加载")
            
            # 释放前检查
            def safe_release(collection_name):
                state = utility.load_state(collection_name)
                
                if state == utility.LoadState.Loaded:
                    collection = Collection(collection_name)
                    collection.release()
                    print(f"已释放: {collection_name}")
                    return True
                elif state == utility.LoadState.NotLoad:
                    print(f"未加载,无需释放: {collection_name}")
                    return True
                else:
                    print(f"状态异常: {state}")
                    return False
            
            safe_release("documents")
            ---
    b.内存管理
        a.功能说明
            合理管理collection的加载和释放可以优化内存使用。建议只加载活跃使用的collection,定期释放不活跃的collection。可以通过监控内存使用情况,动态调整加载策略。使用分区加载可以更细粒度地控制内存占用。在内存不足时,系统可能会自动释放部分collection。实现LRU缓存策略可以自动管理collection的加载和释放。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import psutil
            import time
            from collections import OrderedDict
            
            def get_memory_usage():
                """获取当前内存使用量(MB)"""
                process = psutil.Process()
                return process.memory_info().rss / 1024 / 1024
            
            # LRU Collection管理器
            class CollectionManager:
                def __init__(self, max_memory_mb=8192, max_collections=5):
                    self.max_memory_mb = max_memory_mb
                    self.max_collections = max_collections
                    self.loaded_collections = OrderedDict()
                    self.access_count = {}
                
                def load_collection(self, collection_name):
                    # 如果已加载,更新访问时间
                    if collection_name in self.loaded_collections:
                        self.loaded_collections.move_to_end(collection_name)
                        self.access_count[collection_name] += 1
                        return
                    
                    # 检查内存使用
                    current_memory = get_memory_usage()
                    
                    # 内存不足或collection数量超限,释放最久未使用的
                    while (current_memory > self.max_memory_mb * 0.8 or 
                           len(self.loaded_collections) >= self.max_collections):
                        if not self.loaded_collections:
                            break
                        
                        old_name, _ = self.loaded_collections.popitem(last=False)
                        Collection(old_name).release()
                        print(f"释放Collection: {old_name}")
                        
                        time.sleep(0.5)
                        current_memory = get_memory_usage()
                    
                    # 加载新collection
                    collection = Collection(collection_name)
                    collection.load()
                    self.loaded_collections[collection_name] = time.time()
                    self.access_count[collection_name] = 1
                    print(f"加载Collection: {collection_name}")
                
                def release_all(self):
                    """释放所有collection"""
                    for name in list(self.loaded_collections.keys()):
                        Collection(name).release()
                    self.loaded_collections.clear()
                    self.access_count.clear()
                    print("已释放所有collection")
                
                def get_stats(self):
                    """获取统计信息"""
                    return {
                        "loaded_count": len(self.loaded_collections),
                        "memory_mb": get_memory_usage(),
                        "collections": list(self.loaded_collections.keys()),
                        "access_count": self.access_count
                    }
            
            # 使用管理器
            manager = CollectionManager(max_memory_mb=8192, max_collections=3)
            
            # 模拟访问
            manager.load_collection("documents")
            manager.load_collection("images")
            manager.load_collection("videos")
            
            # 访问已加载的collection
            manager.load_collection("documents")  # 更新访问时间
            
            # 加载新collection(会触发释放)
            manager.load_collection("audio")
            
            # 查看统计
            stats = manager.get_stats()
            print(f"统计信息: {stats}")
            
            # 定期清理
            def periodic_cleanup(manager, interval=300):
                """定期清理不活跃的collection"""
                while True:
                    time.sleep(interval)
                    
                    current_time = time.time()
                    to_release = []
                    
                    for name, load_time in manager.loaded_collections.items():
                        # 超过5分钟未访问
                        if current_time - load_time > 300:
                            to_release.append(name)
                    
                    for name in to_release:
                        Collection(name).release()
                        del manager.loaded_collections[name]
                        print(f"清理不活跃collection: {name}")
            
            # 启动定期清理(后台线程)
            import threading
            cleanup_thread = threading.Thread(
                target=periodic_cleanup,
                args=(manager, 300),
                daemon=True
            )
            cleanup_thread.start()
            ---

03.副本管理
    a.副本配置
        a.功能说明
            副本是collection的完整内存拷贝,用于提高查询吞吐量和可用性。多个副本可以并行处理查询请求,提高并发性能。副本数量在加载时指定,可以动态调整。每个副本会占用相同的内存空间,需要考虑资源限制。副本会自动分布到不同的Query Node,实现负载均衡。副本故障时会自动切换到其他副本,保证服务可用性。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            
            collection = Collection("documents")
            
            # 加载时指定副本数量
            collection.load(replica_number=2)
            print("已加载2个副本")
            
            # 查看副本信息
            replicas = collection.get_replicas()
            print(f"副本数量: {len(replicas.groups)}")
            
            for i, replica in enumerate(replicas.groups):
                print(f"副本 {i}:")
                print(f"  副本ID: {replica.id}")
                print(f"  分片副本: {replica.shards}")
                print(f"  节点: {replica.nodes}")
            
            # 动态调整副本数量
            collection.release()
            collection.load(replica_number=3)
            print("副本数量已调整为3")
            
            # 副本负载均衡测试
            import concurrent.futures
            import time
            
            def query_task(task_id):
                start = time.time()
                results = collection.search(
                    data=[[0.1]*128],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 10}},
                    limit=10
                )
                elapsed = time.time() - start
                return elapsed
            
            # 100个并发查询
            with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
                futures = [executor.submit(query_task, i) for i in range(100)]
                times = [f.result() for f in futures]
            
            avg_time = sum(times) / len(times)
            print(f"平均查询时间: {avg_time*1000:.2f}ms")
            print(f"QPS: {len(times) / sum(times):.2f}")
            ---
    b.副本监控
        a.功能说明
            可以监控副本的状态和负载分布,确保系统正常运行。副本信息包括副本ID、所在节点、分片分布等。通过监控可以发现副本不均衡、节点故障等问题。Milvus会自动管理副本的分布和故障转移。建议定期检查副本状态,及时发现和处理异常。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            collection = Collection("documents")
            collection.load(replica_number=2)
            
            # 副本监控函数
            def monitor_replicas(collection_name, interval=60):
                while True:
                    collection = Collection(collection_name)
                    replicas = collection.get_replicas()
                    
                    print(f"\n=== {time.strftime('%Y-%m-%d %H:%M:%S')} ===")
                    print(f"副本数量: {len(replicas.groups)}")
                    
                    for i, replica in enumerate(replicas.groups):
                        print(f"\n副本 {i}:")
                        print(f"  ID: {replica.id}")
                        print(f"  分片数: {len(replica.shards)}")
                        print(f"  节点数: {len(replica.nodes)}")
                        
                        # 分片信息
                        for shard in replica.shards:
                            print(f"  分片 {shard.shard_id}:")
                            print(f"    通道: {shard.channel_name}")
                            print(f"    节点: {shard.node_ids}")
                    
                    # 检查副本分布
                    all_nodes = set()
                    for replica in replicas.groups:
                        all_nodes.update(replica.nodes)
                    
                    print(f"\n总节点数: {len(all_nodes)}")
                    print(f"节点列表: {all_nodes}")
                    
                    # 检查负载均衡
                    node_replica_count = {}
                    for replica in replicas.groups:
                        for node in replica.nodes:
                            node_replica_count[node] = node_replica_count.get(node, 0) + 1
                    
                    print(f"节点副本分布: {node_replica_count}")
                    
                    time.sleep(interval)
            
            # 启动监控
            import threading
            monitor_thread = threading.Thread(
                target=monitor_replicas,
                args=("documents", 60),
                daemon=True
            )
            monitor_thread.start()
            
            # 副本健康检查
            def check_replica_health(collection_name):
                collection = Collection(collection_name)
                replicas = collection.get_replicas()
                
                if len(replicas.groups) == 0:
                    return False, "没有副本"
                
                # 检查每个副本
                for replica in replicas.groups:
                    if len(replica.nodes) == 0:
                        return False, f"副本 {replica.id} 没有节点"
                    
                    if len(replica.shards) == 0:
                        return False, f"副本 {replica.id} 没有分片"
                
                return True, "所有副本正常"
            
            healthy, message = check_replica_health("documents")
            print(f"健康检查: {message}")
            ---

3.4 删除Collection

01.删除操作
    a.删除Collection
        a.功能说明
            删除操作会永久删除collection及其所有数据和索引。删除前需要先释放collection,否则会报错。删除是不可逆操作,建议在删除前进行备份。删除后collection名称可以重新使用。删除大型collection可能需要较长时间,建议在低峰期进行。删除操作会清理所有相关的元数据、索引文件和数据文件。删除过程是原子的,不会出现部分删除的情况。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import time
            
            # 检查Collection是否存在
            if utility.has_collection("documents"):
                collection = Collection("documents")
                
                # 检查加载状态
                state = utility.load_state("documents")
                if state == utility.LoadState.Loaded:
                    # 释放Collection
                    collection.release()
                    print("已释放Collection")
                    time.sleep(1)
                
                # 删除Collection
                utility.drop_collection("documents")
                print("Collection已删除")
                
                # 验证删除
                assert not utility.has_collection("documents"), "删除失败"
            else:
                print("Collection不存在")
            
            # 安全删除函数
            def safe_drop_collection(collection_name):
                try:
                    if not utility.has_collection(collection_name):
                        print(f"Collection不存在: {collection_name}")
                        return True
                    
                    collection = Collection(collection_name)
                    
                    # 释放(如果已加载)
                    state = utility.load_state(collection_name)
                    if state == utility.LoadState.Loaded:
                        collection.release()
                        time.sleep(1)
                    
                    # 删除
                    utility.drop_collection(collection_name)
                    print(f"已删除: {collection_name}")
                    return True
                    
                except Exception as e:
                    print(f"删除失败: {e}")
                    return False
            
            # 使用安全删除
            safe_drop_collection("test_collection")
            
            # 删除前确认
            def drop_with_confirmation(collection_name):
                if not utility.has_collection(collection_name):
                    print("Collection不存在")
                    return
                
                collection = Collection(collection_name)
                entity_count = collection.num_entities
                
                print(f"警告: 即将删除Collection '{collection_name}'")
                print(f"包含 {entity_count:,} 条数据")
                
                # 在实际应用中,这里应该等待用户确认
                # confirm = input("确认删除? (yes/no): ")
                # if confirm.lower() == "yes":
                
                collection.release()
                utility.drop_collection(collection_name)
                print("删除完成")
            
            drop_with_confirmation("documents")
            ---
    b.批量删除
        a.功能说明
            可以批量删除多个collection,适合清理测试数据或过期数据。建议使用命名规范,便于批量识别和删除。批量删除时需要注意顺序,避免删除重要数据。可以通过前缀或后缀过滤collection名称。删除前应该进行二次确认,防止误删。批量删除适合定期清理任务,如删除临时collection、测试collection等。
        b.代码示例
            ---
            from pymilvus import utility, Collection
            import re
            from datetime import datetime, timedelta
            
            # 列出所有Collection
            all_collections = utility.list_collections()
            print(f"所有Collection: {all_collections}")
            
            # 删除测试Collection(前缀为test_)
            for name in all_collections:
                if name.startswith("test_"):
                    collection = Collection(name)
                    collection.release()
                    utility.drop_collection(name)
                    print(f"已删除测试Collection: {name}")
            
            # 删除临时Collection(前缀为temp_)
            def drop_temp_collections():
                for name in utility.list_collections():
                    if name.startswith("temp_"):
                        safe_drop_collection(name)
            
            drop_temp_collections()
            
            # 删除过期Collection(基于命名规则)
            def drop_expired_collections(days=30):
                """删除超过指定天数的collection"""
                pattern = r"collection_(\d{8})"  # collection_20240101
                cutoff_date = datetime.now() - timedelta(days=days)
                
                dropped_count = 0
                
                for name in utility.list_collections():
                    match = re.match(pattern, name)
                    if match:
                        date_str = match.group(1)
                        try:
                            date = datetime.strptime(date_str, "%Y%m%d")
                            
                            if date < cutoff_date:
                                collection = Collection(name)
                                collection.release()
                                utility.drop_collection(name)
                                print(f"删除过期Collection: {name} (日期: {date_str})")
                                dropped_count += 1
                        except ValueError:
                            print(f"日期格式错误: {name}")
                
                print(f"共删除 {dropped_count} 个过期Collection")
            
            drop_expired_collections(days=30)
            
            # 按模式批量删除
            def drop_by_pattern(pattern, dry_run=True):
                """按正则表达式模式删除collection"""
                regex = re.compile(pattern)
                to_drop = []
                
                for name in utility.list_collections():
                    if regex.match(name):
                        to_drop.append(name)
                
                print(f"匹配到 {len(to_drop)} 个Collection:")
                for name in to_drop:
                    collection = Collection(name)
                    print(f"  {name} ({collection.num_entities:,} 条数据)")
                
                if dry_run:
                    print("(预览模式,未实际删除)")
                    return
                
                # 实际删除
                for name in to_drop:
                    safe_drop_collection(name)
            
            # 预览要删除的collection
            drop_by_pattern(r"^backup_\d+$", dry_run=True)
            
            # 实际删除
            # drop_by_pattern(r"^backup_\d+$", dry_run=False)
            ---

02.数据清理
    a.清空数据
        a.功能说明
            如果只想清空数据但保留collection结构,可以删除所有实体。这种方式保留了Schema和索引定义,可以继续插入新数据。相比删除重建collection,清空数据更快且不需要重新创建索引。适合需要定期清空数据的场景,如临时缓存或测试环境。清空后需要执行compaction释放存储空间。清空大量数据建议分批进行,避免单次操作超时。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            
            collection = Collection("documents")
            
            # 方法1: 删除所有数据(简单但可能超时)
            expr = "id >= 0"  # 匹配所有记录
            collection.delete(expr)
            
            # 刷新删除操作
            collection.flush()
            
            # 触发compaction释放空间
            collection.compact()
            
            print(f"清空后实体数量: {collection.num_entities}")
            
            # 方法2: 分批清空大量数据
            def clear_collection_data(collection, batch_size=10000):
                """分批删除所有数据"""
                total_deleted = 0
                
                while True:
                    # 查询一批ID
                    results = collection.query(
                        expr="id >= 0",
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 删除这批数据
                    ids = [r["id"] for r in results]
                    expr = f"id in {ids}"
                    collection.delete(expr)
                    
                    total_deleted += len(ids)
                    print(f"已删除 {len(ids)} 条数据,累计: {total_deleted}")
                    
                    # 避免过快删除
                    time.sleep(0.1)
                
                # 刷新和压缩
                collection.flush()
                print("正在压缩...")
                collection.compact()
                
                # 等待压缩完成
                from pymilvus import utility
                while True:
                    state = utility.get_compaction_state(collection.name)
                    if state.state == 3:  # 完成
                        break
                    time.sleep(1)
                
                print(f"清空完成,共删除 {total_deleted} 条数据")
                print(f"当前实体数量: {collection.num_entities}")
            
            clear_collection_data(collection, batch_size=10000)
            
            # 方法3: 按条件清空
            def clear_by_condition(collection, expr):
                """按条件删除数据"""
                # 先查询要删除的数量
                results = collection.query(
                    expr=expr,
                    output_fields=["id"],
                    limit=16384  # 最大限制
                )
                
                print(f"匹配到 {len(results)} 条数据")
                
                if len(results) == 0:
                    return
                
                # 删除
                collection.delete(expr)
                collection.flush()
                
                print(f"已删除 {len(results)} 条数据")
            
            # 删除旧数据
            clear_by_condition(collection, "timestamp < 1640000000")
            
            # 删除特定类别
            clear_by_condition(collection, 'category == "test"')
            ---
    b.备份恢复
        a.功能说明
            删除前应该进行数据备份,以防误删或需要恢复。可以导出数据到文件,或复制到新collection。Milvus支持快照功能,可以创建collection的时间点快照。备份策略应该包括定期备份和删除前备份。恢复时需要重新创建collection并导入数据。备份文件应该包含Schema定义和所有数据。建议使用压缩格式减少存储空间。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, utility
            import json
            import gzip
            import pickle
            from datetime import datetime
            
            # 备份Collection数据
            def backup_collection(collection_name, backup_dir="./backups"):
                import os
                os.makedirs(backup_dir, exist_ok=True)
                
                collection = Collection(collection_name)
                collection.load()
                
                # 备份Schema
                schema_dict = {
                    "fields": [
                        {
                            "name": f.name,
                            "dtype": str(f.dtype),
                            "is_primary": f.is_primary,
                            "auto_id": f.auto_id,
                            "params": f.params
                        }
                        for f in collection.schema.fields
                    ],
                    "description": collection.schema.description
                }
                
                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                schema_file = f"{backup_dir}/{collection_name}_schema_{timestamp}.json"
                
                with open(schema_file, 'w') as f:
                    json.dump(schema_dict, f, indent=2)
                
                print(f"Schema已备份: {schema_file}")
                
                # 备份数据(分批)
                batch_size = 10000
                offset = 0
                batch_num = 0
                
                while True:
                    results = collection.query(
                        expr="id >= 0",
                        output_fields=["*"],
                        limit=batch_size,
                        offset=offset
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 保存批次数据(使用gzip压缩)
                    data_file = f"{backup_dir}/{collection_name}_data_{timestamp}_batch{batch_num:04d}.pkl.gz"
                    
                    with gzip.open(data_file, 'wb') as f:
                        pickle.dump(results, f)
                    
                    print(f"批次 {batch_num} 已备份: {len(results)} 条数据")
                    
                    offset += batch_size
                    batch_num += 1
                
                print(f"备份完成: {offset} 条数据,{batch_num} 个批次")
                
                return schema_file, batch_num
            
            # 恢复Collection数据
            def restore_collection(collection_name, schema_file, backup_dir, batch_count):
                import os
                
                # 读取Schema
                with open(schema_file, 'r') as f:
                    schema_dict = json.load(f)
                
                # 重建Schema
                from pymilvus import FieldSchema, DataType
                
                fields = []
                for f in schema_dict["fields"]:
                    dtype = getattr(DataType, f["dtype"].split(".")[-1])
                    field = FieldSchema(
                        name=f["name"],
                        dtype=dtype,
                        is_primary=f.get("is_primary", False),
                        auto_id=f.get("auto_id", False),
                        **f.get("params", {})
                    )
                    fields.append(field)
                
                schema = CollectionSchema(
                    fields=fields,
                    description=schema_dict.get("description", "")
                )
                
                # 删除旧collection(如果存在)
                if utility.has_collection(collection_name):
                    safe_drop_collection(collection_name)
                
                # 创建新collection
                collection = Collection(collection_name, schema=schema)
                print(f"Collection已创建: {collection_name}")
                
                # 恢复数据
                total_restored = 0
                timestamp = os.path.basename(schema_file).split("_")[-1].replace(".json", "")
                
                for batch_num in range(batch_count):
                    data_file = f"{backup_dir}/{collection_name}_data_{timestamp}_batch{batch_num:04d}.pkl.gz"
                    
                    if not os.path.exists(data_file):
                        print(f"批次文件不存在: {data_file}")
                        continue
                    
                    # 读取批次数据
                    with gzip.open(data_file, 'rb') as f:
                        batch_data = pickle.load(f)
                    
                    # 转换数据格式
                    field_data = {}
                    for field in schema.fields:
                        field_data[field.name] = [item[field.name] for item in batch_data]
                    
                    # 插入数据
                    data_list = [field_data[f.name] for f in schema.fields if not f.auto_id]
                    collection.insert(data_list)
                    
                    total_restored += len(batch_data)
                    print(f"批次 {batch_num} 已恢复: {len(batch_data)} 条数据")
                
                # 刷新
                collection.flush()
                print(f"恢复完成: {total_restored} 条数据")
                print(f"当前实体数量: {collection.num_entities}")
            
            # 使用备份和恢复
            # 备份
            schema_file, batch_count = backup_collection("documents", "./backups")
            
            # 恢复
            # restore_collection("documents_restored", schema_file, "./backups", batch_count)
            
            # 定期备份任务
            def scheduled_backup(collection_name, backup_dir, interval_hours=24):
                import time
                
                while True:
                    try:
                        print(f"开始备份: {datetime.now()}")
                        backup_collection(collection_name, backup_dir)
                        print("备份完成")
                    except Exception as e:
                        print(f"备份失败: {e}")
                    
                    time.sleep(interval_hours * 3600)
            
            # 启动定期备份(后台线程)
            import threading
            backup_thread = threading.Thread(
                target=scheduled_backup,
                args=("documents", "./backups", 24),
                daemon=True
            )
            backup_thread.start()
            ---

4 数据操作

4.1 插入数据

01.插入方式
    a.列式插入
        a.功能说明
            Milvus使用列式存储格式,插入数据时需要按列组织。每个字段对应一个列表,所有列表长度必须相同。列式插入是Milvus的标准插入方式,性能最优。插入操作是原子的,要么全部成功要么全部失败。返回值包含插入的主键列表和插入数量。插入后数据不会立即可见,需要等待刷新或自动刷新周期。建议批量插入,单次插入1000-10000条数据性能最佳。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 准备数据(列式格式)
            ids = [i for i in range(1000)]
            titles = [f"文档{i}" for i in range(1000)]
            categories = ["技术", "新闻", "博客"] * 334  # 循环填充
            timestamps = [1700000000 + i for i in range(1000)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(1000)]
            
            # 插入数据(按Schema字段顺序)
            data = [ids, titles, categories, timestamps, embeddings]
            insert_result = collection.insert(data)
            
            print(f"插入成功: {insert_result.insert_count} 条")
            print(f"主键列表: {insert_result.primary_keys[:10]}...")
            
            # 刷新数据(使数据立即可见)
            collection.flush()
            print(f"刷新后实体数量: {collection.num_entities}")
            
            # 验证插入
            results = collection.query(
                expr="id in [0, 1, 2]",
                output_fields=["id", "title", "category"]
            )
            for r in results:
                print(f"ID: {r['id']}, Title: {r['title']}, Category: {r['category']}")
            ---
    b.字典式插入
        a.功能说明
            除了列式插入,Milvus也支持字典列表的插入方式。每条记录是一个字典,字段名作为key。这种方式更直观,但性能略低于列式插入。适合数据来源是JSON或字典格式的场景。字典中必须包含所有非自动生成的字段。字段顺序不重要,Milvus会自动匹配。对于动态Schema,字典式插入更灵活。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 准备数据(字典列表格式)
            data = [
                {
                    "id": 2000 + i,
                    "title": f"文档{2000 + i}",
                    "category": "技术",
                    "timestamp": 1700000000 + i,
                    "embedding": [np.random.random() for _ in range(128)]
                }
                for i in range(100)
            ]
            
            # 插入数据
            insert_result = collection.insert(data)
            print(f"插入成功: {insert_result.insert_count} 条")
            
            # 混合字段顺序
            data_mixed = [
                {
                    "embedding": [0.1] * 128,
                    "id": 3000,
                    "timestamp": 1700000000,
                    "category": "新闻",
                    "title": "文档3000"
                },
                {
                    "title": "文档3001",
                    "id": 3001,
                    "embedding": [0.2] * 128,
                    "category": "博客",
                    "timestamp": 1700000001
                }
            ]
            
            collection.insert(data_mixed)
            collection.flush()
            
            # 动态Schema示例
            collection_dynamic = Collection("dynamic_collection")
            
            data_dynamic = [
                {
                    "id": 1,
                    "embedding": [0.1] * 128,
                    "extra_field1": "额外数据",  # 动态字段
                    "extra_field2": 123
                }
            ]
            
            collection_dynamic.insert(data_dynamic)
            ---

02.数据类型处理
    a.向量数据
        a.功能说明
            向量数据是Milvus的核心数据类型,必须与Schema定义的维度一致。支持Python list、NumPy array等格式。浮点向量使用float32类型,维度可以是任意正整数。二值向量使用bytes类型,维度必须是8的倍数。向量数据会自动归一化(如果索引要求)。插入前建议验证向量维度,避免运行时错误。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # Python list格式
            embedding_list = [[0.1, 0.2, 0.3] * 43 for _ in range(10)]  # 129维截断到128
            embedding_list = [[0.1] * 128 for _ in range(10)]  # 正确的128维
            
            # NumPy array格式
            embedding_np = np.random.rand(10, 128).astype(np.float32)
            
            # 转换为list(Milvus接受)
            embedding_from_np = embedding_np.tolist()
            
            # 插入向量数据
            ids = list(range(4000, 4010))
            titles = [f"文档{i}" for i in range(4000, 4010)]
            categories = ["技术"] * 10
            timestamps = [1700000000] * 10
            
            data = [ids, titles, categories, timestamps, embedding_from_np]
            collection.insert(data)
            
            # 二值向量示例
            from pymilvus import CollectionSchema, FieldSchema, DataType
            
            fields_binary = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="binary_vector", dtype=DataType.BINARY_VECTOR, dim=512)
            ]
            schema_binary = CollectionSchema(fields=fields_binary)
            collection_binary = Collection("binary_collection", schema=schema_binary)
            
            # 生成二值向量(512维 = 64字节)
            binary_vectors = [bytes(np.random.randint(0, 256, 64)) for _ in range(10)]
            ids_binary = list(range(10))
            
            data_binary = [ids_binary, binary_vectors]
            collection_binary.insert(data_binary)
            
            # 向量维度验证
            def validate_vectors(vectors, expected_dim):
                for i, vec in enumerate(vectors):
                    if len(vec) != expected_dim:
                        raise ValueError(f"向量 {i} 维度错误: {len(vec)}, 期望: {expected_dim}")
                return True
            
            validate_vectors(embedding_from_np, 128)
            print("向量维度验证通过")
            ---
    b.标量数据
        a.功能说明
            标量数据包括整数、浮点数、字符串、布尔值等类型。VARCHAR类型必须符合最大长度限制,超长会被截断或报错。JSON类型支持嵌套结构,可以存储复杂对象。整数类型有范围限制,超出范围会报错。时间戳建议使用INT64存储Unix时间戳。NULL值不支持,所有字段都必须有值。
        b.代码示例
            ---
            from pymilvus import Collection
            import json
            import time
            
            collection = Collection("documents")
            
            # 整数类型
            ids = [5000, 5001, 5002]
            ages = [25, 30, 35]  # INT32
            
            # 浮点类型
            scores = [95.5, 87.3, 92.1]  # FLOAT
            ratings = [4.5, 3.8, 4.9]  # DOUBLE
            
            # 字符串类型(注意长度限制)
            titles = ["标题" * 50][:200]  # 截断到200字符
            long_title = "很长的标题" * 100
            if len(long_title) > 200:
                long_title = long_title[:200]
            
            titles = [
                "短标题",
                long_title,
                "中等长度的标题"
            ]
            
            # 布尔类型
            is_active = [True, False, True]
            
            # JSON类型
            metadata = [
                {"author": "张三", "tags": ["AI", "ML"], "views": 1000},
                {"author": "李四", "tags": ["DL"], "views": 500},
                {"author": "王五", "tags": ["NLP", "CV"], "views": 800}
            ]
            
            # 时间戳
            timestamps = [
                int(time.time()),
                int(time.time()) - 86400,  # 1天前
                int(time.time()) - 172800  # 2天前
            ]
            
            # 向量
            embeddings = [[0.1] * 128 for _ in range(3)]
            
            # 插入混合类型数据
            data = [ids, titles, timestamps, embeddings]
            collection.insert(data)
            
            # 类型转换
            def convert_data_types(data_dict):
                """确保数据类型正确"""
                converted = {}
                
                # 整数转换
                if "id" in data_dict:
                    converted["id"] = int(data_dict["id"])
                
                # 字符串长度限制
                if "title" in data_dict:
                    title = str(data_dict["title"])
                    converted["title"] = title[:200]  # 截断
                
                # 时间戳转换
                if "timestamp" in data_dict:
                    ts = data_dict["timestamp"]
                    if isinstance(ts, str):
                        from datetime import datetime
                        dt = datetime.fromisoformat(ts)
                        converted["timestamp"] = int(dt.timestamp())
                    else:
                        converted["timestamp"] = int(ts)
                
                # JSON序列化
                if "metadata" in data_dict:
                    if isinstance(data_dict["metadata"], dict):
                        converted["metadata"] = data_dict["metadata"]
                    else:
                        converted["metadata"] = json.loads(data_dict["metadata"])
                
                return converted
            
            # 使用转换函数
            raw_data = {
                "id": "6000",
                "title": "x" * 300,
                "timestamp": "2024-01-01T00:00:00",
                "metadata": '{"key": "value"}'
            }
            
            converted = convert_data_types(raw_data)
            print(f"转换后: {converted}")
            ---

03.批量插入优化
    a.批次大小
        a.功能说明
            批次大小直接影响插入性能和内存占用。单次插入建议1000-10000条数据,过小会增加网络开销,过大可能导致超时或内存不足。需要根据数据大小和网络条件调整批次大小。向量维度越高,批次应该越小。建议通过性能测试确定最优批次大小。Milvus对单次插入有大小限制(通常几百MB),超过会报错。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 测试不同批次大小
            def test_batch_size(collection, total_count, batch_size):
                start_time = time.time()
                
                for i in range(0, total_count, batch_size):
                    batch_end = min(i + batch_size, total_count)
                    batch_count = batch_end - i
                    
                    # 生成批次数据
                    ids = list(range(i, batch_end))
                    titles = [f"文档{j}" for j in range(i, batch_end)]
                    categories = ["技术"] * batch_count
                    timestamps = [1700000000] * batch_count
                    embeddings = [[np.random.random() for _ in range(128)] for _ in range(batch_count)]
                    
                    # 插入
                    data = [ids, titles, categories, timestamps, embeddings]
                    collection.insert(data)
                
                # 刷新
                collection.flush()
                
                elapsed = time.time() - start_time
                qps = total_count / elapsed
                
                return elapsed, qps
            
            # 测试不同批次大小
            total_count = 10000
            
            for batch_size in [100, 500, 1000, 5000, 10000]:
                elapsed, qps = test_batch_size(collection, total_count, batch_size)
                print(f"批次大小: {batch_size:5d}, 耗时: {elapsed:.2f}s, QPS: {qps:.2f}")
            
            # 自适应批次大小
            def adaptive_batch_insert(collection, data_generator, vector_dim=128):
                # 估算单条数据大小(字节)
                single_size = vector_dim * 4 + 1000  # 向量 + 元数据
                
                # 目标批次大小:10MB
                target_size = 10 * 1024 * 1024
                batch_size = max(100, min(10000, target_size // single_size))
                
                print(f"自适应批次大小: {batch_size}")
                
                batch = []
                for item in data_generator:
                    batch.append(item)
                    
                    if len(batch) >= batch_size:
                        collection.insert(batch)
                        batch = []
                
                # 插入剩余数据
                if batch:
                    collection.insert(batch)
            
            # 使用自适应批次
            def data_gen():
                for i in range(10000):
                    yield {
                        "id": 10000 + i,
                        "title": f"文档{i}",
                        "category": "技术",
                        "timestamp": 1700000000,
                        "embedding": [0.1] * 128
                    }
            
            adaptive_batch_insert(collection, data_gen())
            ---
    b.并发插入
        a.功能说明
            Milvus支持并发插入,可以显著提高吞吐量。多个客户端或线程可以同时插入数据。需要注意主键冲突,确保不同线程插入不同的ID范围。并发插入会增加服务器负载,需要根据服务器性能调整并发度。建议使用连接池管理连接。过高的并发可能导致性能下降或超时。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import concurrent.futures
            import time
            
            collection = Collection("documents")
            
            # 单线程插入函数
            def insert_batch(start_id, count):
                ids = list(range(start_id, start_id + count))
                titles = [f"文档{i}" for i in ids]
                categories = ["技术"] * count
                timestamps = [1700000000] * count
                embeddings = [[np.random.random() for _ in range(128)] for _ in range(count)]
                
                data = [ids, titles, categories, timestamps, embeddings]
                result = collection.insert(data)
                
                return result.insert_count
            
            # 并发插入测试
            def concurrent_insert_test(total_count, num_workers, batch_size):
                start_time = time.time()
                
                # 计算每个worker的ID范围
                tasks = []
                for i in range(num_workers):
                    start_id = 20000 + i * (total_count // num_workers)
                    count = total_count // num_workers
                    tasks.append((start_id, count))
                
                # 并发执行
                with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
                    futures = [executor.submit(insert_batch, start_id, count) for start_id, count in tasks]
                    results = [f.result() for f in futures]
                
                # 刷新
                collection.flush()
                
                elapsed = time.time() - start_time
                total_inserted = sum(results)
                qps = total_inserted / elapsed
                
                return elapsed, qps
            
            # 测试不同并发度
            total_count = 10000
            
            for num_workers in [1, 2, 4, 8]:
                elapsed, qps = concurrent_insert_test(total_count, num_workers, 1000)
                print(f"并发度: {num_workers}, 耗时: {elapsed:.2f}s, QPS: {qps:.2f}")
            
            # 生产者-消费者模式
            import queue
            import threading
            
            def producer(data_queue, total_count):
                """生产数据"""
                for i in range(total_count):
                    item = {
                        "id": 30000 + i,
                        "title": f"文档{i}",
                        "category": "技术",
                        "timestamp": 1700000000,
                        "embedding": [np.random.random() for _ in range(128)]
                    }
                    data_queue.put(item)
                
                # 发送结束信号
                for _ in range(4):  # 4个消费者
                    data_queue.put(None)
            
            def consumer(data_queue, collection, batch_size=1000):
                """消费并插入数据"""
                batch = []
                
                while True:
                    item = data_queue.get()
                    
                    if item is None:  # 结束信号
                        break
                    
                    batch.append(item)
                    
                    if len(batch) >= batch_size:
                        collection.insert(batch)
                        batch = []
                
                # 插入剩余数据
                if batch:
                    collection.insert(batch)
            
            # 启动生产者-消费者
            data_queue = queue.Queue(maxsize=1000)
            
            # 启动生产者
            producer_thread = threading.Thread(target=producer, args=(data_queue, 10000))
            producer_thread.start()
            
            # 启动消费者
            consumer_threads = []
            for _ in range(4):
                t = threading.Thread(target=consumer, args=(data_queue, collection, 1000))
                t.start()
                consumer_threads.append(t)
            
            # 等待完成
            producer_thread.join()
            for t in consumer_threads:
                t.join()
            
            collection.flush()
            print("并发插入完成")
            ---

4.2 删除数据

01.删除方式
    a.按表达式删除
        a.功能说明
            通过表达式删除满足条件的实体是Milvus的主要删除方式。支持按主键、标量字段或组合条件删除。删除操作是异步的,立即返回但数据可能不会立即删除。删除后的数据在查询中不可见,但存储空间不会立即释放。需要执行compaction操作才能真正释放空间。表达式语法与查询表达式相同,支持复杂的逻辑组合。单次删除建议不超过16384条记录。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 删除单条记录(按主键)
            expr = "id == 1001"
            collection.delete(expr)
            print("已删除ID为1001的记录")
            
            # 批量删除(按主键列表)
            ids_to_delete = [1, 2, 3, 4, 5]
            expr = f"id in {ids_to_delete}"
            collection.delete(expr)
            print(f"已删除{len(ids_to_delete)}条记录")
            
            # 范围删除
            expr = "id > 2000 and id < 2100"
            collection.delete(expr)
            print("已删除ID在2000-2100之间的记录")
            
            # 按标量字段删除
            expr = 'category == "test"'
            collection.delete(expr)
            print("已删除测试类别的记录")
            
            # 复杂条件删除
            expr = '(category == "test" or category == "temp") and timestamp < 1700000000'
            collection.delete(expr)
            print("已删除符合条件的记录")
            
            # 刷新删除操作
            collection.flush()
            print(f"当前实体数量: {collection.num_entities}")
            
            # 安全删除函数
            def safe_delete(collection, expr, dry_run=False):
                """安全删除,支持预览模式"""
                # 先查询要删除的数据
                try:
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=16384
                    )
                    
                    count = len(results)
                    print(f"匹配到 {count} 条记录")
                    
                    if count == 0:
                        print("没有匹配的记录")
                        return 0
                    
                    if dry_run:
                        print("(预览模式,未实际删除)")
                        return count
                    
                    # 实际删除
                    collection.delete(expr)
                    collection.flush()
                    print(f"已删除 {count} 条记录")
                    return count
                    
                except Exception as e:
                    print(f"删除失败: {e}")
                    return 0
            
            # 使用安全删除
            safe_delete(collection, "id > 5000", dry_run=True)  # 预览
            safe_delete(collection, "id > 5000", dry_run=False)  # 实际删除
            ---
    b.分批删除
        a.功能说明
            删除大量数据时建议分批进行,避免单次删除过多影响性能。分批删除可以控制每次删除的数量,减少对系统的冲击。适合删除百万级以上的数据。每批删除后可以暂停一段时间,让系统有时间处理。分批删除需要合理设计批次大小和间隔时间。可以通过查询+删除的方式实现精确的分批控制。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            
            collection = Collection("documents")
            
            # 分批删除大量数据
            def batch_delete(collection, expr, batch_size=1000, sleep_interval=0.1):
                """分批删除数据"""
                total_deleted = 0
                
                while True:
                    # 查询一批要删除的ID
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 删除这批数据
                    ids = [r["id"] for r in results]
                    delete_expr = f"id in {ids}"
                    collection.delete(delete_expr)
                    
                    total_deleted += len(ids)
                    print(f"已删除 {len(ids)} 条数据,累计: {total_deleted}")
                    
                    # 暂停
                    if sleep_interval > 0:
                        time.sleep(sleep_interval)
                
                # 刷新
                collection.flush()
                print(f"分批删除完成,共删除 {total_deleted} 条数据")
                return total_deleted
            
            # 删除旧数据
            batch_delete(collection, "timestamp < 1600000000", batch_size=1000)
            
            # 按ID范围分批删除
            def delete_by_id_range(collection, start_id, end_id, batch_size=1000):
                """按ID范围分批删除"""
                total_deleted = 0
                
                for i in range(start_id, end_id, batch_size):
                    batch_end = min(i + batch_size, end_id)
                    expr = f"id >= {i} and id < {batch_end}"
                    
                    collection.delete(expr)
                    total_deleted += (batch_end - i)
                    
                    print(f"已删除 ID {i} 到 {batch_end},累计: {total_deleted}")
                    time.sleep(0.1)
                
                collection.flush()
                print(f"范围删除完成,共删除 {total_deleted} 条数据")
                return total_deleted
            
            delete_by_id_range(collection, 10000, 20000, batch_size=1000)
            
            # 带进度监控的分批删除
            def batch_delete_with_progress(collection, expr, batch_size=1000):
                """带进度监控的分批删除"""
                # 先统计总数
                total_results = collection.query(
                    expr=expr,
                    output_fields=["id"],
                    limit=16384
                )
                total_count = len(total_results)
                
                if total_count == 0:
                    print("没有匹配的记录")
                    return 0
                
                print(f"总共需要删除 {total_count} 条数据")
                
                deleted = 0
                start_time = time.time()
                
                while deleted < total_count:
                    # 查询一批
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    # 删除
                    ids = [r["id"] for r in results]
                    collection.delete(f"id in {ids}")
                    
                    deleted += len(ids)
                    progress = (deleted / total_count) * 100
                    elapsed = time.time() - start_time
                    
                    print(f"进度: {progress:.1f}% ({deleted}/{total_count}), 耗时: {elapsed:.1f}s")
                    
                    time.sleep(0.1)
                
                collection.flush()
                total_time = time.time() - start_time
                print(f"删除完成,总耗时: {total_time:.1f}s")
                return deleted
            
            batch_delete_with_progress(collection, 'category == "temp"', batch_size=1000)
            ---

02.删除策略
    a.软删除标记
        a.功能说明
            软删除是通过标记字段而不是真正删除数据的方式。可以保留数据历史,支持恢复操作。适合需要审计或回滚的场景。软删除的数据仍然占用存储空间,需要定期清理。查询时需要过滤已删除的数据。可以通过定时任务将软删除的数据真正删除。软删除提供了更大的灵活性,但会增加存储和查询开销。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import time
            
            # 创建带软删除标记的Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="is_deleted", dtype=DataType.BOOL),  # 软删除标记
                FieldSchema(name="deleted_at", dtype=DataType.INT64),  # 删除时间
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(fields=fields, description="支持软删除")
            collection = Collection("soft_delete_collection", schema=schema)
            
            # 插入数据(初始未删除)
            data = [
                [1, 2, 3],  # id
                ["文档1", "文档2", "文档3"],  # title
                [False, False, False],  # is_deleted
                [0, 0, 0],  # deleted_at
                [[0.1]*128, [0.2]*128, [0.3]*128]  # embedding
            ]
            collection.insert(data)
            collection.flush()
            
            # 软删除函数
            def soft_delete(collection, ids):
                """软删除指定ID的记录"""
                if not ids:
                    return
                
                # 查询现有数据
                results = collection.query(
                    expr=f"id in {ids}",
                    output_fields=["*"]
                )
                
                if not results:
                    print("没有找到要删除的记录")
                    return
                
                # 先删除旧记录
                collection.delete(f"id in {ids}")
                
                # 重新插入,标记为已删除
                deleted_time = int(time.time())
                
                ids_list = [r["id"] for r in results]
                titles = [r["title"] for r in results]
                is_deleted = [True] * len(results)
                deleted_at = [deleted_time] * len(results)
                embeddings = [r["embedding"] for r in results]
                
                data = [ids_list, titles, is_deleted, deleted_at, embeddings]
                collection.insert(data)
                collection.flush()
                
                print(f"软删除 {len(ids)} 条记录")
            
            # 使用软删除
            soft_delete(collection, [1, 2])
            
            # 查询未删除的数据
            results = collection.query(
                expr="is_deleted == false",
                output_fields=["id", "title"]
            )
            print(f"未删除的记录: {results}")
            
            # 恢复软删除的数据
            def undelete(collection, ids):
                """恢复软删除的记录"""
                results = collection.query(
                    expr=f"id in {ids} and is_deleted == true",
                    output_fields=["*"]
                )
                
                if not results:
                    print("没有找到要恢复的记录")
                    return
                
                # 删除旧记录
                collection.delete(f"id in {ids}")
                
                # 重新插入,标记为未删除
                ids_list = [r["id"] for r in results]
                titles = [r["title"] for r in results]
                is_deleted = [False] * len(results)
                deleted_at = [0] * len(results)
                embeddings = [r["embedding"] for r in results]
                
                data = [ids_list, titles, is_deleted, deleted_at, embeddings]
                collection.insert(data)
                collection.flush()
                
                print(f"恢复 {len(ids)} 条记录")
            
            undelete(collection, [1])
            
            # 定期清理软删除的数据
            def cleanup_soft_deleted(collection, days=30):
                """清理超过指定天数的软删除数据"""
                cutoff_time = int(time.time()) - (days * 86400)
                
                # 查询要清理的数据
                results = collection.query(
                    expr=f"is_deleted == true and deleted_at < {cutoff_time}",
                    output_fields=["id"],
                    limit=16384
                )
                
                if not results:
                    print("没有需要清理的数据")
                    return
                
                # 真正删除
                ids = [r["id"] for r in results]
                collection.delete(f"id in {ids}")
                collection.flush()
                
                print(f"清理 {len(ids)} 条软删除数据")
            
            cleanup_soft_deleted(collection, days=30)
            ---
    b.定时清理
        a.功能说明
            定时清理是自动删除过期数据的机制。可以基于时间戳、访问频率等条件清理数据。适合日志、缓存等时效性数据。定时清理可以通过定时任务或后台线程实现。清理策略应该考虑业务需求和存储成本。建议在低峰期执行清理任务,减少对业务的影响。清理后需要执行compaction释放空间。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            import threading
            from datetime import datetime, timedelta
            
            collection = Collection("documents")
            
            # 基于时间戳的清理
            def cleanup_by_timestamp(collection, days=30):
                """删除超过指定天数的数据"""
                cutoff_time = int(time.time()) - (days * 86400)
                
                expr = f"timestamp < {cutoff_time}"
                
                # 分批删除
                total_deleted = 0
                batch_size = 1000
                
                while True:
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=batch_size
                    )
                    
                    if len(results) == 0:
                        break
                    
                    ids = [r["id"] for r in results]
                    collection.delete(f"id in {ids}")
                    total_deleted += len(ids)
                    
                    print(f"已清理 {len(ids)} 条数据,累计: {total_deleted}")
                    time.sleep(0.1)
                
                collection.flush()
                collection.compact()
                
                print(f"清理完成,共删除 {total_deleted} 条数据")
                return total_deleted
            
            cleanup_by_timestamp(collection, days=30)
            
            # 定时清理任务
            def scheduled_cleanup(collection, interval_hours=24, retention_days=30):
                """定时清理任务"""
                while True:
                    try:
                        print(f"开始清理: {datetime.now()}")
                        deleted = cleanup_by_timestamp(collection, days=retention_days)
                        print(f"清理完成: 删除 {deleted} 条数据")
                    except Exception as e:
                        print(f"清理失败: {e}")
                    
                    # 等待下次清理
                    time.sleep(interval_hours * 3600)
            
            # 启动定时清理(后台线程)
            cleanup_thread = threading.Thread(
                target=scheduled_cleanup,
                args=(collection, 24, 30),
                daemon=True
            )
            cleanup_thread.start()
            
            # 按类别清理
            def cleanup_by_category(collection, categories_to_delete):
                """删除指定类别的数据"""
                for category in categories_to_delete:
                    expr = f'category == "{category}"'
                    
                    results = collection.query(
                        expr=expr,
                        output_fields=["id"],
                        limit=16384
                    )
                    
                    if results:
                        ids = [r["id"] for r in results]
                        collection.delete(f"id in {ids}")
                        print(f"已删除类别 '{category}': {len(ids)} 条数据")
                
                collection.flush()
                collection.compact()
            
            cleanup_by_category(collection, ["test", "temp", "draft"])
            
            # 智能清理策略
            class CleanupManager:
                def __init__(self, collection, max_entities=1000000):
                    self.collection = collection
                    self.max_entities = max_entities
                
                def check_and_cleanup(self):
                    """检查并清理数据"""
                    current_count = self.collection.num_entities
                    
                    if current_count <= self.max_entities:
                        print(f"当前数量 {current_count},无需清理")
                        return
                    
                    # 需要删除的数量
                    to_delete = current_count - self.max_entities
                    print(f"当前数量 {current_count},需要删除 {to_delete} 条")
                    
                    # 删除最旧的数据
                    results = self.collection.query(
                        expr="id >= 0",
                        output_fields=["id", "timestamp"],
                        limit=to_delete + 1000  # 多查一些
                    )
                    
                    # 按时间戳排序
                    results_sorted = sorted(results, key=lambda x: x["timestamp"])
                    
                    # 删除最旧的
                    ids_to_delete = [r["id"] for r in results_sorted[:to_delete]]
                    
                    # 分批删除
                    batch_size = 1000
                    for i in range(0, len(ids_to_delete), batch_size):
                        batch = ids_to_delete[i:i+batch_size]
                        self.collection.delete(f"id in {batch}")
                        print(f"已删除 {len(batch)} 条旧数据")
                    
                    self.collection.flush()
                    self.collection.compact()
                    
                    print(f"清理完成,当前数量: {self.collection.num_entities}")
            
            # 使用智能清理
            manager = CleanupManager(collection, max_entities=1000000)
            manager.check_and_cleanup()
            ---

4.3 更新数据

01.更新机制
    a.Upsert操作
        a.功能说明
            Milvus使用Upsert(Update+Insert)机制更新数据。如果主键存在则更新,不存在则插入。Upsert是原子操作,保证数据一致性。更新操作会替换整条记录,不支持部分字段更新。需要提供完整的字段数据,包括向量。Upsert性能略低于纯插入,因为需要检查主键是否存在。适合需要保持数据最新的场景,如实时更新的文档库。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # Upsert单条数据
            data = [
                [1],  # id (已存在则更新,不存在则插入)
                ["更新后的标题"],  # title
                ["技术"],  # category
                [1700000000],  # timestamp
                [[0.9] * 128]  # embedding (新向量)
            ]
            
            collection.upsert(data)
            collection.flush()
            print("Upsert完成")
            
            # 验证更新
            results = collection.query(
                expr="id == 1",
                output_fields=["id", "title"]
            )
            print(f"更新后: {results}")
            
            # 批量Upsert
            ids = [10, 11, 12, 13, 14]  # 部分存在,部分不存在
            titles = [f"更新文档{i}" for i in ids]
            categories = ["技术"] * len(ids)
            timestamps = [1700000000] * len(ids)
            embeddings = [[np.random.random() for _ in range(128)] for _ in ids]
            
            data = [ids, titles, categories, timestamps, embeddings]
            result = collection.upsert(data)
            
            print(f"Upsert数量: {result.upsert_count}")
            collection.flush()
            
            # Upsert字典格式
            data_dict = [
                {
                    "id": 20,
                    "title": "字典格式更新",
                    "category": "新闻",
                    "timestamp": 1700000000,
                    "embedding": [0.5] * 128
                },
                {
                    "id": 21,
                    "title": "字典格式插入",
                    "category": "博客",
                    "timestamp": 1700000001,
                    "embedding": [0.6] * 128
                }
            ]
            
            collection.upsert(data_dict)
            collection.flush()
            print("字典格式Upsert完成")
            ---
    b.更新策略
        a.功能说明
            由于Milvus不支持部分字段更新,需要先查询完整数据,修改后再Upsert。这种方式会有性能开销,不适合高频更新场景。可以在应用层缓存数据,减少查询次数。对于只需要更新向量的场景,可以只保存必要的元数据。建议批量更新,提高效率。更新操作会产生新的segment,需要定期compaction。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 更新单个字段
            def update_field(collection, id, field_name, new_value):
                """更新单个字段"""
                # 查询现有数据
                results = collection.query(
                    expr=f"id == {id}",
                    output_fields=["*"]
                )
                
                if not results:
                    print(f"ID {id} 不存在")
                    return False
                
                # 修改字段
                record = results[0]
                record[field_name] = new_value
                
                # Upsert
                data = [[record[f.name] for f in collection.schema.fields if not f.auto_id]]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 ID {id} 的 {field_name}")
                return True
            
            update_field(collection, 1, "title", "新标题")
            
            # 批量更新字段
            def batch_update_field(collection, ids, field_name, new_values):
                """批量更新字段"""
                if len(ids) != len(new_values):
                    raise ValueError("ID和值的数量不匹配")
                
                # 查询现有数据
                results = collection.query(
                    expr=f"id in {ids}",
                    output_fields=["*"]
                )
                
                # 创建ID到记录的映射
                records_map = {r["id"]: r for r in results}
                
                # 准备更新数据
                updated_records = []
                for id, new_value in zip(ids, new_values):
                    if id in records_map:
                        record = records_map[id]
                        record[field_name] = new_value
                        updated_records.append(record)
                
                if not updated_records:
                    print("没有找到要更新的记录")
                    return
                
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in updated_records]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 {len(updated_records)} 条记录的 {field_name}")
            
            batch_update_field(collection, [1, 2, 3], "category", ["AI", "ML", "DL"])
            
            # 更新向量
            def update_embedding(collection, id, new_embedding):
                """更新向量"""
                results = collection.query(
                    expr=f"id == {id}",
                    output_fields=["*"]
                )
                
                if not results:
                    print(f"ID {id} 不存在")
                    return False
                
                record = results[0]
                record["embedding"] = new_embedding
                
                # 准备数据
                data = [[record[f.name] for f in collection.schema.fields if not f.auto_id]]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 ID {id} 的向量")
                return True
            
            new_vector = [np.random.random() for _ in range(128)]
            update_embedding(collection, 1, new_vector)
            
            # 条件批量更新
            def conditional_update(collection, expr, field_name, new_value):
                """根据条件批量更新字段"""
                # 查询符合条件的记录
                results = collection.query(
                    expr=expr,
                    output_fields=["*"],
                    limit=16384
                )
                
                if not results:
                    print("没有匹配的记录")
                    return 0
                
                # 更新字段
                for record in results:
                    record[field_name] = new_value
                
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in results]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 {len(results)} 条记录")
                return len(results)
            
            # 将所有test类别改为tech类别
            conditional_update(collection, 'category == "test"', "category", "tech")
            ---

02.增量更新
    a.向量重新编码
        a.功能说明
            当文档内容变化时,需要重新生成向量并更新。这是向量数据库中最常见的更新场景。需要保持向量与文档内容的一致性。可以使用相同的编码模型确保向量空间一致。增量更新适合实时更新的应用,如新闻、社交媒体等。建议批量处理更新请求,提高效率。更新后可能需要重建索引以保持查询性能。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 模拟向量编码器
            def encode_text(text):
                """将文本编码为向量(实际应使用真实的编码模型)"""
                # 这里用随机向量模拟
                return [np.random.random() for _ in range(128)]
            
            # 更新文档内容和向量
            def update_document(collection, doc_id, new_title, new_content):
                """更新文档内容并重新编码向量"""
                # 查询现有数据
                results = collection.query(
                    expr=f"id == {doc_id}",
                    output_fields=["*"]
                )
                
                if not results:
                    print(f"文档 {doc_id} 不存在")
                    return False
                
                # 重新编码向量
                new_embedding = encode_text(new_title + " " + new_content)
                
                # 更新记录
                record = results[0]
                record["title"] = new_title
                record["embedding"] = new_embedding
                record["timestamp"] = int(time.time())  # 更新时间戳
                
                # Upsert
                data = [[record[f.name] for f in collection.schema.fields if not f.auto_id]]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新文档 {doc_id}")
                return True
            
            update_document(collection, 1, "新标题", "新内容...")
            
            # 批量重新编码
            def batch_reencode(collection, doc_ids):
                """批量重新编码向量"""
                # 查询文档
                results = collection.query(
                    expr=f"id in {doc_ids}",
                    output_fields=["*"]
                )
                
                if not results:
                    print("没有找到文档")
                    return 0
                
                # 重新编码
                updated_records = []
                for record in results:
                    # 重新编码
                    new_embedding = encode_text(record["title"])
                    record["embedding"] = new_embedding
                    record["timestamp"] = int(time.time())
                    updated_records.append(record)
                
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in updated_records]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已重新编码 {len(updated_records)} 个文档")
                return len(updated_records)
            
            batch_reencode(collection, [1, 2, 3, 4, 5])
            
            # 增量更新队列
            import queue
            import threading
            import time
            
            class IncrementalUpdater:
                def __init__(self, collection, batch_size=100, flush_interval=5):
                    self.collection = collection
                    self.batch_size = batch_size
                    self.flush_interval = flush_interval
                    self.update_queue = queue.Queue()
                    self.running = False
                
                def start(self):
                    """启动更新线程"""
                    self.running = True
                    self.worker_thread = threading.Thread(target=self._worker, daemon=True)
                    self.worker_thread.start()
                
                def stop(self):
                    """停止更新线程"""
                    self.running = False
                    self.worker_thread.join()
                
                def submit_update(self, doc_id, title, content):
                    """提交更新请求"""
                    self.update_queue.put((doc_id, title, content))
                
                def _worker(self):
                    """后台更新线程"""
                    batch = []
                    last_flush = time.time()
                    
                    while self.running:
                        try:
                            # 获取更新请求(超时)
                            item = self.update_queue.get(timeout=1)
                            batch.append(item)
                            
                            # 达到批次大小或超时,执行更新
                            if len(batch) >= self.batch_size or \
                               (time.time() - last_flush) > self.flush_interval:
                                self._flush_batch(batch)
                                batch = []
                                last_flush = time.time()
                                
                        except queue.Empty:
                            # 超时,检查是否有待处理的批次
                            if batch and (time.time() - last_flush) > self.flush_interval:
                                self._flush_batch(batch)
                                batch = []
                                last_flush = time.time()
                
                def _flush_batch(self, batch):
                    """刷新批次更新"""
                    if not batch:
                        return
                    
                    doc_ids = [item[0] for item in batch]
                    
                    # 查询现有数据
                    results = self.collection.query(
                        expr=f"id in {doc_ids}",
                        output_fields=["*"]
                    )
                    
                    records_map = {r["id"]: r for r in results}
                    
                    # 更新记录
                    updated_records = []
                    for doc_id, title, content in batch:
                        if doc_id in records_map:
                            record = records_map[doc_id]
                            record["title"] = title
                            record["embedding"] = encode_text(title + " " + content)
                            record["timestamp"] = int(time.time())
                            updated_records.append(record)
                    
                    if updated_records:
                        # 转换为列式格式
                        field_data = {}
                        for field in self.collection.schema.fields:
                            if not field.auto_id:
                                field_data[field.name] = [r[field.name] for r in updated_records]
                        
                        data = [field_data[f.name] for f in self.collection.schema.fields if not f.auto_id]
                        self.collection.upsert(data)
                        self.collection.flush()
                        
                        print(f"批量更新 {len(updated_records)} 个文档")
            
            # 使用增量更新器
            updater = IncrementalUpdater(collection, batch_size=100, flush_interval=5)
            updater.start()
            
            # 提交更新请求
            for i in range(50):
                updater.submit_update(i, f"更新标题{i}", f"更新内容{i}")
            
            # 等待处理完成
            time.sleep(10)
            updater.stop()
            ---
    b.元数据更新
        a.功能说明
            元数据更新不涉及向量变化,只更新标量字段。这种更新比向量更新简单,但仍需要查询完整数据。适合更新分类、标签、状态等字段。可以通过缓存减少查询开销。元数据更新频率通常高于向量更新。建议使用批量更新提高效率。对于高频更新的字段,可以考虑使用外部存储。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 更新分类
            def update_category(collection, doc_ids, new_category):
                """批量更新分类"""
                results = collection.query(
                    expr=f"id in {doc_ids}",
                    output_fields=["*"]
                )
                
                if not results:
                    return 0
                
                # 更新分类
                for record in results:
                    record["category"] = new_category
                    record["timestamp"] = int(time.time())
                
                # Upsert
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in results]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已更新 {len(results)} 个文档的分类")
                return len(results)
            
            update_category(collection, [1, 2, 3], "AI")
            
            # 批量添加标签
            def add_tags(collection, doc_ids, new_tags):
                """批量添加标签(假设使用JSON字段存储标签)"""
                results = collection.query(
                    expr=f"id in {doc_ids}",
                    output_fields=["*"]
                )
                
                for record in results:
                    # 获取现有标签
                    metadata = record.get("metadata", {})
                    existing_tags = metadata.get("tags", [])
                    
                    # 添加新标签
                    updated_tags = list(set(existing_tags + new_tags))
                    metadata["tags"] = updated_tags
                    
                    record["metadata"] = metadata
                    record["timestamp"] = int(time.time())
                
                # Upsert
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in results]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.upsert(data)
                collection.flush()
                
                print(f"已为 {len(results)} 个文档添加标签")
            
            add_tags(collection, [1, 2, 3], ["机器学习", "深度学习"])
            
            # 元数据缓存
            class MetadataCache:
                def __init__(self, collection, cache_size=1000):
                    self.collection = collection
                    self.cache = {}
                    self.cache_size = cache_size
                    self.access_order = []
                
                def get(self, doc_id):
                    """获取文档元数据"""
                    if doc_id in self.cache:
                        # 更新访问顺序
                        self.access_order.remove(doc_id)
                        self.access_order.append(doc_id)
                        return self.cache[doc_id]
                    
                    # 从数据库查询
                    results = self.collection.query(
                        expr=f"id == {doc_id}",
                        output_fields=["*"]
                    )
                    
                    if not results:
                        return None
                    
                    record = results[0]
                    
                    # 添加到缓存
                    if len(self.cache) >= self.cache_size:
                        # 移除最久未使用的
                        old_id = self.access_order.pop(0)
                        del self.cache[old_id]
                    
                    self.cache[doc_id] = record
                    self.access_order.append(doc_id)
                    
                    return record
                
                def update(self, doc_id, updates):
                    """更新文档元数据"""
                    record = self.get(doc_id)
                    if not record:
                        return False
                    
                    # 更新字段
                    for key, value in updates.items():
                        record[key] = value
                    
                    record["timestamp"] = int(time.time())
                    
                    # 更新缓存
                    self.cache[doc_id] = record
                    
                    # Upsert到数据库
                    data = [[record[f.name] for f in self.collection.schema.fields if not f.auto_id]]
                    self.collection.upsert(data)
                    
                    return True
                
                def flush(self):
                    """刷新所有缓存的更新"""
                    self.collection.flush()
            
            # 使用元数据缓存
            cache = MetadataCache(collection, cache_size=1000)
            
            # 更新元数据
            cache.update(1, {"category": "AI", "views": 1000})
            cache.update(2, {"category": "ML", "views": 500})
            
            # 刷新
            cache.flush()
            ---

4.4 批量操作

01.批量插入优化
    a.数据预处理
        a.功能说明
            批量插入前的数据预处理可以显著提高性能。包括数据验证、格式转换、去重等操作。预处理可以在插入前发现错误,避免部分插入失败。建议使用NumPy等高效库处理大规模数据。可以并行处理数据预处理和插入操作。预处理应该包括维度检查、类型转换、空值处理等。合理的预处理可以减少插入时的错误和重试。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import pandas as pd
            
            collection = Collection("documents")
            
            # 数据验证器
            class DataValidator:
                def __init__(self, schema):
                    self.schema = schema
                    self.field_map = {f.name: f for f in schema.fields}
                
                def validate_record(self, record):
                    """验证单条记录"""
                    errors = []
                    
                    # 检查必需字段
                    for field in self.schema.fields:
                        if field.auto_id:
                            continue
                        
                        if field.name not in record:
                            errors.append(f"缺少字段: {field.name}")
                            continue
                        
                        value = record[field.name]
                        
                        # 检查向量维度
                        if str(field.dtype) == "DataType.FLOAT_VECTOR":
                            expected_dim = field.params.get("dim")
                            if len(value) != expected_dim:
                                errors.append(f"向量维度错误: {field.name}, 期望{expected_dim}, 实际{len(value)}")
                        
                        # 检查VARCHAR长度
                        elif str(field.dtype) == "DataType.VARCHAR":
                            max_len = field.params.get("max_length")
                            if len(str(value)) > max_len:
                                errors.append(f"字符串过长: {field.name}, 最大{max_len}, 实际{len(str(value))}")
                    
                    return len(errors) == 0, errors
                
                def validate_batch(self, records):
                    """验证批次数据"""
                    valid_records = []
                    invalid_records = []
                    
                    for i, record in enumerate(records):
                        is_valid, errors = self.validate_record(record)
                        if is_valid:
                            valid_records.append(record)
                        else:
                            invalid_records.append((i, record, errors))
                    
                    return valid_records, invalid_records
            
            # 使用验证器
            validator = DataValidator(collection.schema)
            
            test_records = [
                {"id": 1, "title": "文档1", "category": "AI", "timestamp": 1700000000, "embedding": [0.1]*128},
                {"id": 2, "title": "文档2", "category": "ML", "timestamp": 1700000000, "embedding": [0.2]*100},  # 维度错误
                {"id": 3, "title": "x"*300, "category": "DL", "timestamp": 1700000000, "embedding": [0.3]*128}  # 标题过长
            ]
            
            valid, invalid = validator.validate_batch(test_records)
            print(f"有效记录: {len(valid)}")
            print(f"无效记录: {len(invalid)}")
            for i, record, errors in invalid:
                print(f"  记录{i}: {errors}")
            
            # 数据预处理管道
            class DataPreprocessor:
                def __init__(self, schema):
                    self.schema = schema
                
                def preprocess_batch(self, records):
                    """预处理批次数据"""
                    processed = []
                    
                    for record in records:
                        processed_record = self.preprocess_record(record)
                        if processed_record:
                            processed.append(processed_record)
                    
                    return processed
                
                def preprocess_record(self, record):
                    """预处理单条记录"""
                    processed = {}
                    
                    for field in self.schema.fields:
                        if field.auto_id:
                            continue
                        
                        if field.name not in record:
                            return None
                        
                        value = record[field.name]
                        
                        # VARCHAR截断
                        if str(field.dtype) == "DataType.VARCHAR":
                            max_len = field.params.get("max_length")
                            value = str(value)[:max_len]
                        
                        # 向量归一化
                        elif str(field.dtype) == "DataType.FLOAT_VECTOR":
                            value = np.array(value, dtype=np.float32)
                            # L2归一化
                            norm = np.linalg.norm(value)
                            if norm > 0:
                                value = (value / norm).tolist()
                            else:
                                value = value.tolist()
                        
                        # 整数类型转换
                        elif "INT" in str(field.dtype):
                            value = int(value)
                        
                        # 浮点类型转换
                        elif "FLOAT" in str(field.dtype) or "DOUBLE" in str(field.dtype):
                            value = float(value)
                        
                        processed[field.name] = value
                    
                    return processed
            
            # 使用预处理器
            preprocessor = DataPreprocessor(collection.schema)
            
            raw_data = [
                {"id": "100", "title": "x"*300, "category": "AI", "timestamp": "1700000000", "embedding": [1.0]*128},
                {"id": "101", "title": "文档2", "category": "ML", "timestamp": "1700000001", "embedding": [2.0]*128}
            ]
            
            processed_data = preprocessor.preprocess_batch(raw_data)
            print(f"预处理完成: {len(processed_data)} 条记录")
            
            # 批量插入预处理后的数据
            if processed_data:
                # 转换为列式格式
                field_data = {}
                for field in collection.schema.fields:
                    if not field.auto_id:
                        field_data[field.name] = [r[field.name] for r in processed_data]
                
                data = [field_data[f.name] for f in collection.schema.fields if not f.auto_id]
                collection.insert(data)
                collection.flush()
            ---
    b.内存管理
        a.功能说明
            大规模批量插入需要注意内存管理,避免内存溢出。建议使用生成器或迭代器处理大文件,而不是一次性加载到内存。可以使用分块读取的方式处理CSV、JSON等文件。NumPy数组比Python list更节省内存。及时释放不再使用的数据结构。可以通过监控内存使用情况动态调整批次大小。使用内存映射文件处理超大数据集。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import pandas as pd
            import psutil
            import gc
            
            collection = Collection("documents")
            
            def get_memory_usage():
                """获取当前内存使用(MB)"""
                process = psutil.Process()
                return process.memory_info().rss / 1024 / 1024
            
            # 生成器方式读取大文件
            def read_large_csv(filename, chunk_size=10000):
                """分块读取大CSV文件"""
                for chunk in pd.read_csv(filename, chunksize=chunk_size):
                    yield chunk
            
            # 批量插入大文件
            def insert_from_large_file(collection, filename, batch_size=1000):
                """从大文件批量插入"""
                total_inserted = 0
                
                for chunk in read_large_csv(filename, chunk_size=batch_size):
                    # 转换为插入格式
                    ids = chunk["id"].tolist()
                    titles = chunk["title"].tolist()
                    categories = chunk["category"].tolist()
                    timestamps = chunk["timestamp"].tolist()
                    
                    # 假设embedding列是字符串格式的列表
                    embeddings = chunk["embedding"].apply(eval).tolist()
                    
                    data = [ids, titles, categories, timestamps, embeddings]
                    collection.insert(data)
                    
                    total_inserted += len(ids)
                    
                    # 显示进度和内存使用
                    memory_mb = get_memory_usage()
                    print(f"已插入: {total_inserted}, 内存: {memory_mb:.2f}MB")
                    
                    # 定期刷新
                    if total_inserted % 10000 == 0:
                        collection.flush()
                        gc.collect()  # 强制垃圾回收
                
                collection.flush()
                print(f"插入完成: {total_inserted} 条记录")
            
            # 使用NumPy节省内存
            def efficient_batch_insert(collection, count=100000):
                """高效批量插入"""
                batch_size = 1000
                
                for i in range(0, count, batch_size):
                    batch_count = min(batch_size, count - i)
                    
                    # 使用NumPy生成数据(更节省内存)
                    ids = np.arange(i, i + batch_count, dtype=np.int64)
                    embeddings = np.random.rand(batch_count, 128).astype(np.float32)
                    
                    # 转换为list(Milvus要求)
                    data = [
                        ids.tolist(),
                        [f"文档{j}" for j in range(i, i + batch_count)],
                        ["技术"] * batch_count,
                        [1700000000] * batch_count,
                        embeddings.tolist()
                    ]
                    
                    collection.insert(data)
                    
                    # 清理NumPy数组
                    del ids, embeddings
                    
                    if (i + batch_count) % 10000 == 0:
                        collection.flush()
                        gc.collect()
                        memory_mb = get_memory_usage()
                        print(f"进度: {i + batch_count}/{count}, 内存: {memory_mb:.2f}MB")
                
                collection.flush()
            
            efficient_batch_insert(collection, count=100000)
            
            # 自适应批次大小
            class AdaptiveBatchInserter:
                def __init__(self, collection, max_memory_mb=1024):
                    self.collection = collection
                    self.max_memory_mb = max_memory_mb
                    self.batch_size = 1000
                
                def insert_batch(self, data):
                    """插入批次并调整批次大小"""
                    memory_before = get_memory_usage()
                    
                    self.collection.insert(data)
                    
                    memory_after = get_memory_usage()
                    memory_used = memory_after - memory_before
                    
                    # 根据内存使用调整批次大小
                    if memory_after > self.max_memory_mb * 0.8:
                        # 内存使用过高,减小批次
                        self.batch_size = max(100, int(self.batch_size * 0.8))
                        print(f"减小批次大小: {self.batch_size}")
                    elif memory_used < 50 and self.batch_size < 10000:
                        # 内存使用较低,增大批次
                        self.batch_size = min(10000, int(self.batch_size * 1.2))
                        print(f"增大批次大小: {self.batch_size}")
                    
                    return self.batch_size
            
            inserter = AdaptiveBatchInserter(collection, max_memory_mb=1024)
            
            # 使用自适应插入
            total = 50000
            current = 0
            
            while current < total:
                batch_count = min(inserter.batch_size, total - current)
                
                # 生成批次数据
                data = [
                    list(range(current, current + batch_count)),
                    [f"文档{i}" for i in range(batch_count)],
                    ["技术"] * batch_count,
                    [1700000000] * batch_count,
                    [[0.1]*128 for _ in range(batch_count)]
                ]
                
                inserter.insert_batch(data)
                current += batch_count
            
            collection.flush()
            ---

02.批量查询优化
    a.并行查询
        a.功能说明
            批量查询可以通过并行处理提高吞吐量。Milvus支持多个查询并发执行。可以使用线程池或进程池并行发送查询请求。需要注意控制并发度,避免过载服务器。并行查询适合查询延迟敏感的场景。可以通过批量查询减少网络往返次数。建议根据服务器性能调整并发数量。
        b.代码示例
            ---
            from pymilvus import Collection
            import concurrent.futures
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 单个查询函数
            def query_by_id(collection, doc_id):
                """按ID查询"""
                results = collection.query(
                    expr=f"id == {doc_id}",
                    output_fields=["id", "title", "category"]
                )
                return results
            
            # 串行查询
            def serial_query(collection, doc_ids):
                """串行查询"""
                start = time.time()
                results = []
                
                for doc_id in doc_ids:
                    result = query_by_id(collection, doc_id)
                    results.extend(result)
                
                elapsed = time.time() - start
                return results, elapsed
            
            # 并行查询
            def parallel_query(collection, doc_ids, max_workers=10):
                """并行查询"""
                start = time.time()
                results = []
                
                with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
                    futures = [executor.submit(query_by_id, collection, doc_id) for doc_id in doc_ids]
                    
                    for future in concurrent.futures.as_completed(futures):
                        result = future.result()
                        results.extend(result)
                
                elapsed = time.time() - start
                return results, elapsed
            
            # 性能对比
            test_ids = list(range(1, 101))
            
            results_serial, time_serial = serial_query(collection, test_ids)
            print(f"串行查询: {len(results_serial)} 条, 耗时: {time_serial:.2f}s")
            
            results_parallel, time_parallel = parallel_query(collection, test_ids, max_workers=10)
            print(f"并行查询: {len(results_parallel)} 条, 耗时: {time_parallel:.2f}s")
            print(f"加速比: {time_serial / time_parallel:.2f}x")
            
            # 批量IN查询
            def batch_in_query(collection, doc_ids, batch_size=100):
                """批量IN查询"""
                results = []
                
                for i in range(0, len(doc_ids), batch_size):
                    batch = doc_ids[i:i+batch_size]
                    
                    batch_results = collection.query(
                        expr=f"id in {batch}",
                        output_fields=["id", "title", "category"]
                    )
                    results.extend(batch_results)
                
                return results
            
            # 批量查询(更高效)
            results_batch = batch_in_query(collection, test_ids, batch_size=50)
            print(f"批量查询: {len(results_batch)} 条")
            
            # 混合策略:批量+并行
            def hybrid_query(collection, doc_ids, batch_size=50, max_workers=5):
                """混合查询策略"""
                # 分批
                batches = [doc_ids[i:i+batch_size] for i in range(0, len(doc_ids), batch_size)]
                
                results = []
                
                # 并行执行批次查询
                with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
                    futures = [
                        executor.submit(
                            collection.query,
                            expr=f"id in {batch}",
                            output_fields=["id", "title", "category"]
                        )
                        for batch in batches
                    ]
                    
                    for future in concurrent.futures.as_completed(futures):
                        batch_results = future.result()
                        results.extend(batch_results)
                
                return results
            
            start = time.time()
            results_hybrid = hybrid_query(collection, test_ids, batch_size=20, max_workers=5)
            time_hybrid = time.time() - start
            print(f"混合查询: {len(results_hybrid)} 条, 耗时: {time_hybrid:.2f}s")
            ---
    b.结果聚合
        a.功能说明
            批量查询后需要聚合结果,包括去重、排序、分页等操作。可以在应用层实现复杂的聚合逻辑。需要注意内存占用,大量结果应该分批处理。可以使用生成器返回结果,减少内存压力。聚合操作应该考虑性能,避免O(n²)复杂度的算法。可以使用Pandas等库简化聚合操作。
        b.代码示例
            ---
            from pymilvus import Collection
            import pandas as pd
            from collections import defaultdict
            
            collection = Collection("documents")
            collection.load()
            
            # 批量查询并聚合
            def query_and_aggregate(collection, categories):
                """按类别查询并聚合统计"""
                results_by_category = defaultdict(list)
                
                for category in categories:
                    results = collection.query(
                        expr=f'category == "{category}"',
                        output_fields=["id", "title", "category", "timestamp"],
                        limit=1000
                    )
                    results_by_category[category].extend(results)
                
                # 统计每个类别的数量
                stats = {cat: len(results) for cat, results in results_by_category.items()}
                
                return results_by_category, stats
            
            categories = ["AI", "ML", "DL"]
            results, stats = query_and_aggregate(collection, categories)
            
            print("类别统计:")
            for cat, count in stats.items():
                print(f"  {cat}: {count} 条")
            
            # 使用Pandas聚合
            def query_to_dataframe(collection, expr, limit=10000):
                """查询结果转DataFrame"""
                results = collection.query(
                    expr=expr,
                    output_fields=["*"],
                    limit=limit
                )
                
                if not results:
                    return pd.DataFrame()
                
                df = pd.DataFrame(results)
                return df
            
            # 查询并分析
            df = query_to_dataframe(collection, "id > 0", limit=10000)
            
            if not df.empty:
                # 按类别统计
                category_counts = df["category"].value_counts()
                print("\n类别分布:")
                print(category_counts)
                
                # 时间范围
                if "timestamp" in df.columns:
                    df["datetime"] = pd.to_datetime(df["timestamp"], unit="s")
                    print(f"\n时间范围: {df['datetime'].min()} 到 {df['datetime'].max()}")
                
                # 导出结果
                df.to_csv("query_results.csv", index=False)
                print("\n结果已导出到 query_results.csv")
            
            # 分页聚合
            def paginated_query(collection, expr, page_size=100):
                """分页查询(生成器)"""
                offset = 0
                
                while True:
                    results = collection.query(
                        expr=expr,
                        output_fields=["*"],
                        limit=page_size,
                        offset=offset
                    )
                    
                    if not results:
                        break
                    
                    yield results
                    offset += page_size
            
            # 使用分页查询
            total_count = 0
            for page in paginated_query(collection, "id > 0", page_size=1000):
                total_count += len(page)
                print(f"处理了 {len(page)} 条记录,累计: {total_count}")
            
            # 多条件聚合
            def multi_condition_aggregate(collection):
                """多条件聚合查询"""
                conditions = [
                    ('category == "AI"', "AI类别"),
                    ('category == "ML" and timestamp > 1700000000', "ML类别且时间>阈值"),
                    ('category == "DL" or category == "NLP"', "DL或NLP类别")
                ]
                
                results = {}
                
                for expr, desc in conditions:
                    query_results = collection.query(
                        expr=expr,
                        output_fields=["id", "title", "category"],
                        limit=1000
                    )
                    results[desc] = query_results
                    print(f"{desc}: {len(query_results)} 条")
                
                return results
            
            aggregated = multi_condition_aggregate(collection)
            ---

5 索引系统

5.1 向量索引类型

01.索引分类
    a.精确索引
        a.功能说明
            精确索引(FLAT)通过暴力计算保证100%召回率。适合小规模数据集(百万级以下)或对召回率要求极高的场景。不需要训练过程,构建速度快。查询时需要计算与所有向量的距离,性能随数据量线性下降。内存占用与数据量成正比。精确索引是其他索引的性能基准,常用于对比测试。适合原型开发和小规模应用。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("flat_index_demo", schema=schema)
            
            # 插入测试数据
            ids = list(range(10000))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(10000)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 创建FLAT索引
            index_params = {
                "index_type": "FLAT",
                "metric_type": "L2",
                "params": {}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            
            print("FLAT索引创建完成")
            
            # 加载并搜索
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=10
            )
            
            print(f"搜索结果: {len(results[0])} 条")
            for hit in results[0]:
                print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            ---
    b.近似索引
        a.功能说明
            近似索引通过牺牲少量召回率换取查询性能提升。包括IVF、HNSW、ANNOY等多种算法。需要训练过程,构建时间较长。查询性能不随数据量线性增长,适合大规模数据。内存占用可以通过参数调整。召回率通常在95%-99%之间,满足大多数应用需求。不同算法有不同的性能特点,需要根据场景选择。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("approx_index_demo", schema=schema)
            
            # 插入大规模数据
            batch_size = 10000
            total_count = 100000
            
            for i in range(0, total_count, batch_size):
                ids = list(range(i, i + batch_size))
                embeddings = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                data = [ids, embeddings]
                collection.insert(data)
                print(f"已插入: {i + batch_size}/{total_count}")
            
            collection.flush()
            
            # 创建IVF_FLAT索引(近似索引)
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}  # 聚类中心数量
            }
            
            print("开始构建索引...")
            start = time.time()
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            elapsed = time.time() - start
            print(f"索引构建完成,耗时: {elapsed:.2f}s")
            
            # 加载并搜索
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 搜索参数(控制召回率和性能)
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}  # 搜索的聚类数量
            }
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            elapsed = time.time() - start
            
            print(f"搜索完成,耗时: {elapsed*1000:.2f}ms")
            print(f"结果数量: {len(results[0])}")
            ---

02.索引算法
    a.IVF系列
        a.功能说明
            IVF(Inverted File Index)是基于聚类的索引算法。将向量空间划分为多个聚类(Voronoi单元),查询时只搜索最近的几个聚类。IVF_FLAT保留原始向量,IVF_SQ8使用标量量化压缩,IVF_PQ使用乘积量化压缩。nlist参数控制聚类数量,通常设置为sqrt(N)到4*sqrt(N)。nprobe参数控制搜索的聚类数量,越大召回率越高但性能越低。适合中大规模数据集(百万到亿级)。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # IVF_FLAT: 精确距离计算
            ivf_flat_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024  # 聚类中心数量
                }
            }
            
            # IVF_SQ8: 标量量化(节省75%内存)
            ivf_sq8_params = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024
                }
            }
            
            # IVF_PQ: 乘积量化(节省90%+内存)
            ivf_pq_params = {
                "index_type": "IVF_PQ",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024,
                    "m": 8,  # 子向量数量(必须能整除dim)
                    "nbits": 8  # 每个子向量的编码位数
                }
            }
            
            # 创建索引
            collection.create_index(
                field_name="embedding",
                index_params=ivf_flat_params
            )
            
            collection.load()
            
            # 搜索参数
            search_params_low = {"metric_type": "L2", "params": {"nprobe": 8}}  # 低召回率,高性能
            search_params_mid = {"metric_type": "L2", "params": {"nprobe": 16}}  # 平衡
            search_params_high = {"metric_type": "L2", "params": {"nprobe": 32}}  # 高召回率,低性能
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 对比不同nprobe的性能
            import time
            
            for params in [search_params_low, search_params_mid, search_params_high]:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=params,
                    limit=10
                )
                elapsed = time.time() - start
                
                nprobe = params["params"]["nprobe"]
                print(f"nprobe={nprobe}: 耗时 {elapsed*1000:.2f}ms")
            ---
    b.图索引
        a.功能说明
            图索引(HNSW)构建多层导航图,通过图遍历快速找到近邻。HNSW(Hierarchical Navigable Small World)是目前性能最好的近似索引之一。查询性能稳定,不受数据分布影响。内存占用较高,但查询速度快。M参数控制图的连接度,efConstruction控制构建质量,ef控制搜索质量。适合对查询延迟要求高的场景。构建时间较长,但查询性能优秀。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # HNSW索引参数
            hnsw_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,  # 每层的最大连接数(4-64)
                    "efConstruction": 200  # 构建时的搜索深度(100-500)
                }
            }
            
            print("开始构建HNSW索引...")
            start = time.time()
            collection.create_index(
                field_name="embedding",
                index_params=hnsw_params
            )
            elapsed = time.time() - start
            print(f"索引构建完成,耗时: {elapsed:.2f}s")
            
            collection.load()
            
            # 搜索参数
            search_params_fast = {"metric_type": "L2", "params": {"ef": 64}}  # 快速搜索
            search_params_balanced = {"metric_type": "L2", "params": {"ef": 128}}  # 平衡
            search_params_accurate = {"metric_type": "L2", "params": {"ef": 256}}  # 高精度
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 对比不同ef的性能
            for params in [search_params_fast, search_params_balanced, search_params_accurate]:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=params,
                    limit=10
                )
                elapsed = time.time() - start
                
                ef = params["params"]["ef"]
                print(f"ef={ef}: 耗时 {elapsed*1000:.2f}ms")
            
            # HNSW vs IVF性能对比
            # 重建为IVF索引
            collection.release()
            collection.drop_index()
            
            ivf_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            collection.create_index(field_name="embedding", index_params=ivf_params)
            collection.load()
            
            # IVF搜索
            start = time.time()
            results_ivf = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10
            )
            time_ivf = time.time() - start
            
            print(f"\nIVF_FLAT: {time_ivf*1000:.2f}ms")
            print(f"HNSW通常比IVF快2-5倍,但内存占用更高")
            ---

03.距离度量
    a.欧氏距离
        a.功能说明
            欧氏距离(L2)是最常用的向量距离度量。计算两个向量之间的直线距离。适合大多数向量相似度场景。距离越小表示越相似。支持归一化和非归一化向量。计算复杂度为O(d),d为向量维度。Milvus对L2距离有硬件加速优化。适合图像、音频等连续特征的相似度计算。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建L2索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",  # 欧氏距离
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            
            collection.load()
            
            # L2搜索
            query_vector = [[np.random.random() for _ in range(128)]]
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=10
            )
            
            print("L2距离搜索结果:")
            for hit in results[0]:
                print(f"  ID: {hit.id}, L2距离: {hit.distance:.4f}")
            
            # 手动计算L2距离验证
            def l2_distance(vec1, vec2):
                """计算L2距离"""
                vec1 = np.array(vec1)
                vec2 = np.array(vec2)
                return np.sqrt(np.sum((vec1 - vec2) ** 2))
            
            # 验证第一个结果
            first_id = results[0][0].id
            result_vec = collection.query(
                expr=f"id == {first_id}",
                output_fields=["embedding"]
            )[0]["embedding"]
            
            manual_distance = l2_distance(query_vector[0], result_vec)
            milvus_distance = results[0][0].distance
            
            print(f"\n验证:")
            print(f"  Milvus距离: {milvus_distance:.4f}")
            print(f"  手动计算: {manual_distance:.4f}")
            print(f"  误差: {abs(milvus_distance - manual_distance):.6f}")
            ---
    b.内积和余弦
        a.功能说明
            内积(IP)计算两个向量的点积,值越大表示越相似。余弦相似度(COSINE)计算向量夹角的余弦值,范围[-1, 1]。对于归一化向量,IP和COSINE等价。适合文本向量、推荐系统等场景。Milvus中COSINE会自动归一化向量。IP适合已归一化的向量,避免重复归一化开销。内积计算比L2稍快。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建IP索引
            index_params_ip = {
                "index_type": "IVF_FLAT",
                "metric_type": "IP",  # 内积
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params_ip
            )
            
            collection.load()
            
            # 归一化查询向量
            query_vector = np.random.random(128)
            query_vector = query_vector / np.linalg.norm(query_vector)  # L2归一化
            query_vector = [query_vector.tolist()]
            
            # IP搜索
            results_ip = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "IP"},
                limit=10
            )
            
            print("内积搜索结果:")
            for hit in results_ip[0]:
                print(f"  ID: {hit.id}, 内积: {hit.distance:.4f}")
            
            # 使用COSINE
            collection.release()
            collection.drop_index()
            
            index_params_cosine = {
                "index_type": "IVF_FLAT",
                "metric_type": "COSINE",  # 余弦相似度
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params_cosine
            )
            
            collection.load()
            
            # COSINE搜索(自动归一化)
            query_vector_raw = [[np.random.random() for _ in range(128)]]  # 未归一化
            
            results_cosine = collection.search(
                data=query_vector_raw,
                anns_field="embedding",
                param={"metric_type": "COSINE"},
                limit=10
            )
            
            print("\n余弦相似度搜索结果:")
            for hit in results_cosine[0]:
                print(f"  ID: {hit.id}, 余弦相似度: {hit.distance:.4f}")
            
            # 手动计算余弦相似度
            def cosine_similarity(vec1, vec2):
                """计算余弦相似度"""
                vec1 = np.array(vec1)
                vec2 = np.array(vec2)
                return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
            
            # 验证
            first_id = results_cosine[0][0].id
            result_vec = collection.query(
                expr=f"id == {first_id}",
                output_fields=["embedding"]
            )[0]["embedding"]
            
            manual_cosine = cosine_similarity(query_vector_raw[0], result_vec)
            milvus_cosine = results_cosine[0][0].distance
            
            print(f"\n验证:")
            print(f"  Milvus余弦: {milvus_cosine:.4f}")
            print(f"  手动计算: {manual_cosine:.4f}")
            
            # IP vs COSINE对比
            print("\nIP vs COSINE:")
            print("  归一化向量: IP == COSINE")
            print("  未归一化向量: COSINE会自动归一化,IP不会")
            print("  性能: IP略快(避免归一化开销)")
            print("  适用场景: 文本向量通常使用COSINE,图像向量可以使用L2或IP")
            ---

5.2 FLAT索引

01.基本特性
    a.精确搜索
        a.功能说明
            FLAT索引通过暴力计算保证100%召回率,是唯一的精确索引类型。搜索时计算查询向量与所有向量的距离,然后返回Top-K结果。不需要训练过程,创建索引几乎是瞬时的。内存占用等于原始向量数据大小。查询时间复杂度为O(N*d),N为向量数量,d为维度。适合数据量小于100万的场景。常用作其他索引的性能和召回率基准。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields, description="FLAT索引测试")
            collection = Collection("flat_test", schema=schema)
            
            # 插入测试数据
            data_sizes = [1000, 10000, 100000]
            
            for size in data_sizes:
                # 清空collection
                collection.drop()
                collection = Collection("flat_test", schema=schema)
                
                # 插入数据
                ids = list(range(size))
                titles = [f"文档{i}" for i in range(size)]
                embeddings = [[np.random.random() for _ in range(128)] for _ in range(size)]
                
                data = [ids, titles, embeddings]
                collection.insert(data)
                collection.flush()
                
                # 创建FLAT索引
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                index_time = time.time() - start
                
                collection.load()
                
                # 测试查询性能
                query_vector = [[np.random.random() for _ in range(128)]]
                
                # 预热
                collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=10
                )
                
                # 正式测试
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=10
                )
                query_time = time.time() - start
                
                print(f"\n数据量: {size:,}")
                print(f"  索引构建时间: {index_time*1000:.2f}ms")
                print(f"  查询时间: {query_time*1000:.2f}ms")
                print(f"  召回率: 100% (精确搜索)")
            ---
    b.适用场景
        a.功能说明
            FLAT索引适合小规模数据集、原型开发、精确搜索需求、召回率基准测试等场景。在数据量小于10万时性能可接受。适合对召回率有严格要求的应用,如医疗、金融等领域。可以作为其他索引的对照组,验证近似索引的召回率。在开发初期使用FLAT索引可以快速验证功能。不适合大规模生产环境,除非数据量确实很小。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            # 场景1: 小规模精确搜索
            def small_scale_exact_search():
                """小规模数据的精确搜索"""
                collection = Collection("medical_images")  # 假设医疗图像库
                
                # FLAT索引保证精确结果
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 查询最相似的病例
                query_vector = [[0.1] * 128]  # 患者图像向量
                
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=5,
                    output_fields=["id", "title"]
                )
                
                print("最相似的5个病例(100%精确):")
                for hit in results[0]:
                    print(f"  病例ID: {hit.id}, 相似度: {hit.distance:.4f}")
            
            # 场景2: 召回率基准测试
            def recall_benchmark():
                """使用FLAT作为召回率基准"""
                collection = Collection("documents")
                
                query_vector = [[np.random.random() for _ in range(128)]]
                
                # FLAT索引(精确结果)
                collection.release()
                collection.drop_index()
                
                flat_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                collection.create_index(field_name="embedding", index_params=flat_params)
                collection.load()
                
                flat_results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=100
                )
                
                flat_ids = set([hit.id for hit in flat_results[0]])
                
                # IVF索引(近似结果)
                collection.release()
                collection.drop_index()
                
                ivf_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 1024}
                }
                collection.create_index(field_name="embedding", index_params=ivf_params)
                collection.load()
                
                ivf_results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=100
                )
                
                ivf_ids = set([hit.id for hit in ivf_results[0]])
                
                # 计算召回率
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                print(f"IVF索引召回率: {recall*100:.2f}%")
            
            # 场景3: 原型开发
            def prototype_development():
                """原型开发阶段使用FLAT索引"""
                collection = Collection("prototype_collection")
                
                # 快速创建索引,无需调参
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                print("原型开发建议:")
                print("  1. 使用FLAT索引快速验证功能")
                print("  2. 数据量控制在10万以内")
                print("  3. 功能稳定后再切换到近似索引")
                print("  4. 保留FLAT索引作为召回率基准")
            
            small_scale_exact_search()
            recall_benchmark()
            prototype_development()
            ---

02.性能特征
    a.时间复杂度
        a.功能说明
            FLAT索引的构建时间复杂度为O(1),几乎瞬时完成。查询时间复杂度为O(N*d),N为向量数量,d为维度。随着数据量增长,查询时间线性增长。批量查询可以利用SIMD指令加速。GPU加速可以显著提升性能。对于固定数据量,查询时间相对稳定。不受数据分布影响,性能可预测。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import matplotlib.pyplot as plt
            
            # 测试不同数据量的查询时间
            def test_query_time_scaling():
                """测试查询时间随数据量的变化"""
                data_sizes = [1000, 5000, 10000, 50000, 100000]
                query_times = []
                
                for size in data_sizes:
                    # 创建collection
                    fields = [
                        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
                    ]
                    schema = CollectionSchema(fields=fields)
                    collection = Collection(f"flat_scale_test_{size}", schema=schema)
                    
                    # 插入数据
                    ids = list(range(size))
                    embeddings = [[np.random.random() for _ in range(128)] for _ in range(size)]
                    data = [ids, embeddings]
                    collection.insert(data)
                    collection.flush()
                    
                    # 创建索引
                    index_params = {
                        "index_type": "FLAT",
                        "metric_type": "L2",
                        "params": {}
                    }
                    collection.create_index(field_name="embedding", index_params=index_params)
                    collection.load()
                    
                    # 测试查询时间
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    # 多次查询取平均
                    times = []
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2"},
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000  # 转换为ms
                    query_times.append(avg_time)
                    
                    print(f"数据量: {size:6d}, 平均查询时间: {avg_time:.2f}ms")
                    
                    # 清理
                    collection.drop()
                
                # 绘制曲线
                plt.figure(figsize=(10, 6))
                plt.plot(data_sizes, query_times, marker='o')
                plt.xlabel('数据量')
                plt.ylabel('查询时间 (ms)')
                plt.title('FLAT索引查询时间随数据量的变化')
                plt.grid(True)
                plt.savefig('flat_scaling.png')
                print("\n性能曲线已保存到 flat_scaling.png")
            
            test_query_time_scaling()
            
            # 测试不同维度的影响
            def test_dimension_impact():
                """测试向量维度对查询时间的影响"""
                dimensions = [64, 128, 256, 512, 1024]
                query_times = []
                
                data_size = 10000
                
                for dim in dimensions:
                    fields = [
                        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
                    ]
                    schema = CollectionSchema(fields=fields)
                    collection = Collection(f"flat_dim_test_{dim}", schema=schema)
                    
                    # 插入数据
                    ids = list(range(data_size))
                    embeddings = [[np.random.random() for _ in range(dim)] for _ in range(data_size)]
                    data = [ids, embeddings]
                    collection.insert(data)
                    collection.flush()
                    
                    # 创建索引
                    index_params = {
                        "index_type": "FLAT",
                        "metric_type": "L2",
                        "params": {}
                    }
                    collection.create_index(field_name="embedding", index_params=index_params)
                    collection.load()
                    
                    # 测试查询时间
                    query_vector = [[np.random.random() for _ in range(dim)]]
                    
                    times = []
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2"},
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000
                    query_times.append(avg_time)
                    
                    print(f"维度: {dim:4d}, 平均查询时间: {avg_time:.2f}ms")
                    
                    collection.drop()
                
                print(f"\n结论: 查询时间与维度成正比")
            
            test_dimension_impact()
            ---
    b.空间复杂度
        a.功能说明
            FLAT索引的空间复杂度为O(N*d*4)字节,N为向量数量,d为维度。不进行任何压缩,完全存储原始向量。对于128维float32向量,每个向量占用512字节。100万向量约占用512MB内存。内存占用是可预测的,不受索引参数影响。相比压缩索引(如IVF_SQ8、IVF_PQ),内存占用最高。适合内存充足的场景。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            
            # 计算内存占用
            def calculate_memory_usage(num_vectors, dim):
                """计算FLAT索引的内存占用"""
                bytes_per_vector = dim * 4  # float32
                total_bytes = num_vectors * bytes_per_vector
                total_mb = total_bytes / 1024 / 1024
                total_gb = total_mb / 1024
                
                return {
                    "vectors": num_vectors,
                    "dimension": dim,
                    "bytes_per_vector": bytes_per_vector,
                    "total_mb": total_mb,
                    "total_gb": total_gb
                }
            
            # 常见规模的内存占用
            scenarios = [
                (10000, 128, "小规模应用"),
                (100000, 128, "中等规模应用"),
                (1000000, 128, "大规模应用"),
                (1000000, 768, "大模型embedding"),
                (10000000, 128, "超大规模应用")
            ]
            
            print("FLAT索引内存占用估算:\n")
            for num_vectors, dim, desc in scenarios:
                usage = calculate_memory_usage(num_vectors, dim)
                print(f"{desc}:")
                print(f"  向量数量: {usage['vectors']:,}")
                print(f"  向量维度: {usage['dimension']}")
                print(f"  单向量大小: {usage['bytes_per_vector']} 字节")
                print(f"  总内存: {usage['total_mb']:.2f} MB ({usage['total_gb']:.2f} GB)")
                print()
            
            # 实际测量内存占用
            def measure_actual_memory():
                """实际测量FLAT索引的内存占用"""
                collection = Collection("memory_test")
                
                # 插入数据
                size = 100000
                dim = 128
                
                ids = list(range(size))
                embeddings = [[np.random.random() for _ in range(dim)] for _ in range(size)]
                data = [ids, embeddings]
                collection.insert(data)
                collection.flush()
                
                # 创建索引
                index_params = {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                }
                collection.create_index(field_name="embedding", index_params=index_params)
                
                # 获取collection统计信息
                stats = collection.get_stats()
                print("Collection统计信息:")
                print(stats)
                
                # 理论内存占用
                theoretical_mb = calculate_memory_usage(size, dim)["total_mb"]
                print(f"\n理论内存占用: {theoretical_mb:.2f} MB")
                print("实际占用略高于理论值(包含元数据和索引结构)")
            
            measure_actual_memory()
            
            # 内存占用对比
            def compare_index_memory():
                """对比不同索引的内存占用"""
                print("\n不同索引类型的内存占用对比(100万向量,128维):\n")
                
                comparisons = [
                    ("FLAT", 1.0, "512 MB", "无压缩,精确搜索"),
                    ("IVF_FLAT", 1.0, "512 MB", "无压缩,近似搜索"),
                    ("IVF_SQ8", 0.25, "128 MB", "标量量化,节省75%"),
                    ("IVF_PQ", 0.05, "26 MB", "乘积量化,节省95%"),
                    ("HNSW", 1.5, "768 MB", "图索引,额外图结构")
                ]
                
                for index_type, ratio, memory, description in comparisons:
                    print(f"{index_type:12s}: {memory:8s} (相对FLAT: {ratio*100:5.1f}%) - {description}")
                
                print("\n建议:")
                print("  - 内存充足: 使用FLAT或HNSW")
                print("  - 内存紧张: 使用IVF_SQ8或IVF_PQ")
                print("  - 平衡选择: 使用IVF_FLAT")
            
            compare_index_memory()
            ---

5.3 IVF系列索引

01.IVF原理
    a.聚类分区
        a.功能说明
            IVF(Inverted File Index)通过K-means聚类将向量空间划分为多个Voronoi单元。每个单元由一个聚类中心(centroid)表示,向量被分配到最近的聚类中心。查询时先找到最近的几个聚类中心,然后只在这些聚类内搜索。nlist参数控制聚类数量,通常设置为sqrt(N)到4*sqrt(N),N为向量总数。聚类过程需要训练,使用部分数据进行K-means迭代。训练时间与nlist和数据量成正比。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("ivf_demo", schema=schema)
            
            # 插入数据
            data_size = 100000
            ids = list(range(data_size))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 测试不同nlist值
            nlist_values = [128, 256, 512, 1024, 2048]
            
            for nlist in nlist_values:
                # 创建IVF索引
                index_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": nlist}
                }
                
                print(f"\nnlist = {nlist}")
                
                # 测量构建时间
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                print(f"  构建时间: {build_time:.2f}s")
                
                collection.load()
                
                # 测试查询性能
                query_vector = [[np.random.random() for _ in range(128)]]
                
                # 不同nprobe值
                for nprobe in [1, 8, 16, 32]:
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": nprobe}
                    }
                    
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    query_time = time.time() - start
                    
                    print(f"  nprobe={nprobe:2d}: {query_time*1000:.2f}ms")
                
                # 清理索引
                collection.release()
                collection.drop_index()
            
            # nlist选择建议
            def recommend_nlist(num_vectors):
                """推荐nlist值"""
                sqrt_n = int(np.sqrt(num_vectors))
                
                recommendations = {
                    "conservative": sqrt_n,
                    "balanced": 2 * sqrt_n,
                    "aggressive": 4 * sqrt_n
                }
                
                return recommendations
            
            print(f"\n对于 {data_size:,} 个向量:")
            recs = recommend_nlist(data_size)
            for strategy, value in recs.items():
                print(f"  {strategy}: nlist = {value}")
            ---
    b.搜索策略
        a.功能说明
            IVF搜索分为两个阶段:粗搜索和精搜索。粗搜索阶段计算查询向量到所有聚类中心的距离,选择最近的nprobe个聚类。精搜索阶段在选中的聚类内计算精确距离,返回Top-K结果。nprobe参数控制搜索的聚类数量,是召回率和性能的关键平衡点。nprobe越大召回率越高但性能越低。nprobe=nlist时等价于FLAT索引。建议通过实验确定最优nprobe值。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("ivf_demo")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同nprobe的召回率和性能
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 先用FLAT获取精确结果作为基准
            collection.release()
            collection.drop_index()
            
            flat_params = {
                "index_type": "FLAT",
                "metric_type": "L2",
                "params": {}
            }
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 切换回IVF索引
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同nprobe
            print("nprobe性能和召回率对比:\n")
            print(f"{'nprobe':>8s} {'查询时间':>10s} {'召回率':>8s}")
            print("-" * 30)
            
            nprobe_values = [1, 2, 4, 8, 16, 32, 64, 128]
            
            for nprobe in nprobe_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                # 测量查询时间
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 计算召回率
                ivf_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                
                print(f"{nprobe:8d} {avg_time:9.2f}ms {recall*100:7.2f}%")
            
            # 自动选择nprobe
            def auto_select_nprobe(collection, query_vector, target_recall=0.95, max_nprobe=128):
                """自动选择满足目标召回率的最小nprobe"""
                # 获取精确结果
                collection.release()
                collection.drop_index()
                
                flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
                collection.create_index(field_name="embedding", index_params=flat_params)
                collection.load()
                
                flat_results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2"},
                    limit=100
                )
                flat_ids = set([hit.id for hit in flat_results[0]])
                
                # 恢复IVF索引
                collection.release()
                collection.drop_index()
                
                ivf_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 1024}
                }
                collection.create_index(field_name="embedding", index_params=ivf_params)
                collection.load()
                
                # 二分查找最优nprobe
                left, right = 1, max_nprobe
                best_nprobe = max_nprobe
                
                while left <= right:
                    mid = (left + right) // 2
                    
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": mid}
                    }
                    
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    
                    ivf_ids = set([hit.id for hit in results[0]])
                    recall = len(flat_ids & ivf_ids) / len(flat_ids)
                    
                    if recall >= target_recall:
                        best_nprobe = mid
                        right = mid - 1
                    else:
                        left = mid + 1
                
                return best_nprobe
            
            optimal_nprobe = auto_select_nprobe(collection, query_vector, target_recall=0.95)
            print(f"\n推荐nprobe值(95%召回率): {optimal_nprobe}")
            ---

02.IVF变体
    a.IVF_FLAT
        a.功能说明
            IVF_FLAT是最基础的IVF索引,保留原始向量不压缩。查询时计算精确距离,召回率仅受nprobe影响。内存占用与FLAT相同,但查询性能显著提升。适合内存充足且对召回率要求高的场景。是IVF系列中召回率最高的变体。构建速度快于压缩变体。推荐作为IVF系列的首选,除非内存受限。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # IVF_FLAT索引配置
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024  # 聚类数量
                }
            }
            
            print("开始构建IVF_FLAT索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=index_params)
            build_time = time.time() - start
            print(f"构建完成,耗时: {build_time:.2f}s")
            
            collection.load()
            
            # 性能测试
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(100)]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单次查询
            start = time.time()
            results = collection.search(
                data=[query_vectors[0]],
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            single_time = time.time() - start
            print(f"单次查询: {single_time*1000:.2f}ms")
            
            # 批量查询
            start = time.time()
            results = collection.search(
                data=query_vectors,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            batch_time = time.time() - start
            print(f"批量查询(100): {batch_time*1000:.2f}ms")
            print(f"平均每次: {batch_time/100*1000:.2f}ms")
            
            # 内存占用估算
            num_vectors = collection.num_entities
            dim = 128
            memory_mb = num_vectors * dim * 4 / 1024 / 1024
            print(f"\n内存占用估算: {memory_mb:.2f} MB")
            
            # 性能调优建议
            print("\nIVF_FLAT调优建议:")
            print("  1. nlist = sqrt(N) ~ 4*sqrt(N)")
            print("  2. nprobe = 8~64 (根据召回率要求)")
            print("  3. 批量查询可提升吞吐量")
            print("  4. 适合内存充足的场景")
            ---
    b.IVF_SQ8
        a.功能说明
            IVF_SQ8使用8位标量量化压缩向量,将float32压缩到uint8。内存占用降低75%,但会损失精度。量化过程将每个维度的值映射到0-255范围。查询时需要反量化计算距离,略微增加计算开销。适合内存受限但对精度要求不极端的场景。召回率略低于IVF_FLAT,通常在98%以上。推荐用于大规模数据集的内存优化。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # IVF_SQ8索引配置
            index_params = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024
                }
            }
            
            print("开始构建IVF_SQ8索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=index_params)
            build_time = time.time() - start
            print(f"构建完成,耗时: {build_time:.2f}s")
            
            collection.load()
            
            # 性能测试
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            query_time = time.time() - start
            print(f"查询时间: {query_time*1000:.2f}ms")
            
            # 内存节省
            num_vectors = collection.num_entities
            dim = 128
            
            flat_memory = num_vectors * dim * 4 / 1024 / 1024  # float32
            sq8_memory = num_vectors * dim * 1 / 1024 / 1024   # uint8
            savings = (1 - sq8_memory / flat_memory) * 100
            
            print(f"\n内存对比:")
            print(f"  FLAT: {flat_memory:.2f} MB")
            print(f"  SQ8:  {sq8_memory:.2f} MB")
            print(f"  节省: {savings:.1f}%")
            
            # 精度对比
            print("\nIVF_SQ8特点:")
            print("  优点: 节省75%内存,查询速度接近IVF_FLAT")
            print("  缺点: 精度略有损失(通常<2%)")
            print("  适用: 大规模数据集,内存受限场景")
            
            # 量化原理示例
            def quantize_vector(vector):
                """演示标量量化过程"""
                vector = np.array(vector)
                
                # 找到最小值和最大值
                vmin, vmax = vector.min(), vector.max()
                
                # 映射到0-255
                quantized = ((vector - vmin) / (vmax - vmin) * 255).astype(np.uint8)
                
                # 反量化
                dequantized = quantized.astype(np.float32) / 255 * (vmax - vmin) + vmin
                
                # 计算误差
                error = np.abs(vector - dequantized).mean()
                
                return quantized, dequantized, error
            
            test_vector = [np.random.random() for _ in range(128)]
            quantized, dequantized, error = quantize_vector(test_vector)
            
            print(f"\n量化示例:")
            print(f"  原始范围: [{min(test_vector):.4f}, {max(test_vector):.4f}]")
            print(f"  量化范围: [0, 255]")
            print(f"  平均误差: {error:.6f}")
            ---

03.参数调优
    a.nlist选择
        a.功能说明
            nlist是IVF索引最重要的参数,决定聚类数量。nlist过小导致每个聚类包含过多向量,查询性能下降。nlist过大导致聚类过细,粗搜索开销增加。推荐范围:sqrt(N)到4*sqrt(N),N为向量总数。对于100万向量,推荐nlist=1000-4000。nlist应该是2的幂次,便于内存对齐。需要根据数据分布和查询模式调整。构建时间与nlist成正比。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 测试不同nlist值的性能
            num_vectors = collection.num_entities
            sqrt_n = int(np.sqrt(num_vectors))
            
            nlist_candidates = [
                sqrt_n,
                2 * sqrt_n,
                4 * sqrt_n,
                1024,  # 常用值
                2048,
                4096
            ]
            
            print(f"向量数量: {num_vectors:,}")
            print(f"sqrt(N): {sqrt_n}\n")
            
            results_summary = []
            
            for nlist in nlist_candidates:
                # 创建索引
                index_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": nlist}
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                collection.load()
                
                # 测试查询性能(nprobe=16)
                query_vector = [[np.random.random() for _ in range(128)]]
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": 16}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    times.append(time.time() - start)
                
                avg_query_time = np.mean(times) * 1000
                
                results_summary.append({
                    "nlist": nlist,
                    "build_time": build_time,
                    "query_time": avg_query_time
                })
                
                print(f"nlist={nlist:5d}: 构建 {build_time:5.2f}s, 查询 {avg_query_time:6.2f}ms")
                
                # 清理
                collection.release()
                collection.drop_index()
            
            # 推荐最优nlist
            best = min(results_summary, key=lambda x: x["query_time"])
            print(f"\n推荐nlist: {best['nlist']} (查询时间最短)")
            
            # nlist选择策略
            def recommend_nlist_strategy(num_vectors):
                """推荐nlist选择策略"""
                sqrt_n = int(np.sqrt(num_vectors))
                
                strategies = {
                    "快速构建": sqrt_n,
                    "平衡性能": 2 * sqrt_n,
                    "高性能": 4 * sqrt_n
                }
                
                # 限制在合理范围
                for key in strategies:
                    strategies[key] = max(64, min(65536, strategies[key]))
                    # 向上取整到2的幂次
                    strategies[key] = 2 ** int(np.ceil(np.log2(strategies[key])))
                
                return strategies
            
            strategies = recommend_nlist_strategy(num_vectors)
            print("\nnlist选择策略:")
            for strategy, value in strategies.items():
                print(f"  {strategy}: {value}")
            ---
    b.nprobe调优
        a.功能说明
            nprobe控制搜索时探测的聚类数量,是召回率和性能的平衡点。nprobe=1时性能最快但召回率最低。nprobe=nlist时等价于FLAT索引,召回率100%但性能最差。推荐范围:8-64,根据召回率要求调整。nprobe应该远小于nlist,通常是nlist的1%-10%。可以通过A/B测试确定最优nprobe。不同查询可以使用不同nprobe值。实时查询用小nprobe,离线分析用大nprobe。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 获取精确结果作为基准
            collection.release()
            collection.drop_index()
            
            flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 恢复IVF索引
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同nprobe
            print("nprobe调优分析:\n")
            print(f"{'nprobe':>8s} {'查询时间':>12s} {'召回率':>10s} {'性价比':>10s}")
            print("-" * 45)
            
            nprobe_range = [1, 2, 4, 8, 16, 32, 64, 128, 256]
            
            for nprobe in nprobe_range:
                if nprobe > 1024:  # 不超过nlist
                    continue
                
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                # 测量查询时间
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 计算召回率
                ivf_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                
                # 性价比 = 召回率 / 查询时间
                efficiency = recall / avg_time if avg_time > 0 else 0
                
                print(f"{nprobe:8d} {avg_time:10.2f}ms {recall*100:9.2f}% {efficiency:10.4f}")
            
            # 自动推荐nprobe
            def recommend_nprobe(target_recall=0.95, max_latency_ms=10):
                """根据召回率和延迟要求推荐nprobe"""
                recommendations = []
                
                for nprobe in [1, 2, 4, 8, 16, 32, 64, 128]:
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": nprobe}
                    }
                    
                    # 测试
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    query_time = (time.time() - start) * 1000
                    
                    ivf_ids = set([hit.id for hit in results[0]])
                    recall = len(flat_ids & ivf_ids) / len(flat_ids)
                    
                    if recall >= target_recall and query_time <= max_latency_ms:
                        recommendations.append({
                            "nprobe": nprobe,
                            "recall": recall,
                            "latency": query_time
                        })
                
                return recommendations
            
            print("\n推荐配置(召回率≥95%, 延迟≤10ms):")
            recs = recommend_nprobe(target_recall=0.95, max_latency_ms=10)
            
            if recs:
                best = min(recs, key=lambda x: x["nprobe"])
                print(f"  推荐nprobe: {best['nprobe']}")
                print(f"  召回率: {best['recall']*100:.2f}%")
                print(f"  延迟: {best['latency']:.2f}ms")
            else:
                print("  无满足条件的配置,建议放宽要求或增加nlist")
            ---

5.4 HNSW索引

01.HNSW原理
    a.分层图结构
        a.功能说明
            HNSW(Hierarchical Navigable Small World)构建多层导航图,每层是一个小世界图。底层包含所有向量节点,上层节点逐层稀疏。查询从最顶层开始,逐层向下搜索,每层找到局部最优后进入下层。图中节点通过边连接,边表示向量间的相似关系。M参数控制每层的最大连接数,影响图的连通性和内存占用。efConstruction控制构建时的搜索宽度,影响图质量。HNSW查询性能稳定,不受数据分布影响。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("hnsw_demo", schema=schema)
            
            # 插入数据
            data_size = 100000
            ids = list(range(data_size))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 测试不同M值
            m_values = [4, 8, 16, 32, 64]
            
            print("HNSW参数M的影响:\n")
            print(f"{'M':>4s} {'构建时间':>12s} {'查询时间':>12s} {'内存估算':>12s}")
            print("-" * 45)
            
            for m in m_values:
                # 创建HNSW索引
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": m,
                        "efConstruction": 200
                    }
                }
                
                # 构建时间
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                collection.load()
                
                # 查询时间
                query_vector = [[np.random.random() for _ in range(128)]]
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 128}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 内存估算(每个节点约M*2条边)
                memory_per_vector = 128 * 4 + m * 2 * 8  # 向量 + 边
                total_memory_mb = data_size * memory_per_vector / 1024 / 1024
                
                print(f"{m:4d} {build_time:10.2f}s {avg_time:10.2f}ms {total_memory_mb:10.2f}MB")
                
                collection.release()
                collection.drop_index()
            
            print("\nM参数选择建议:")
            print("  M=4-8:   低内存,适合大规模数据")
            print("  M=16:    平衡选择(推荐)")
            print("  M=32-64: 高精度,内存占用高")
            ---
    b.搜索过程
        a.功能说明
            HNSW搜索从顶层入口节点开始,使用贪心策略找到当前层的局部最优节点。然后进入下一层,以上层的最优节点为起点继续搜索。在底层进行精细搜索,维护一个候选集合。ef参数控制搜索宽度,ef越大搜索越全面但速度越慢。ef必须大于等于limit(返回结果数)。推荐ef=64-512,根据精度要求调整。HNSW的查询时间是对数级别,性能优秀。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("hnsw_demo")
            
            # 创建HNSW索引
            index_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,
                    "efConstruction": 200
                }
            }
            
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同ef值
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 获取FLAT基准
            collection.release()
            collection.drop_index()
            
            flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 恢复HNSW
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试ef参数
            print("HNSW ef参数影响:\n")
            print(f"{'ef':>6s} {'查询时间':>12s} {'召回率':>10s}")
            print("-" * 32)
            
            ef_values = [10, 32, 64, 128, 256, 512]
            
            for ef in ef_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": ef}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                hnsw_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & hnsw_ids) / len(flat_ids)
                
                print(f"{ef:6d} {avg_time:10.2f}ms {recall*100:9.2f}%")
            
            # 搜索过程可视化(概念)
            print("\nHNSW搜索过程:")
            print("  1. 从顶层入口节点开始")
            print("  2. 在当前层贪心搜索局部最优")
            print("  3. 进入下一层,以上层最优为起点")
            print("  4. 重复直到底层")
            print("  5. 在底层维护ef大小的候选集")
            print("  6. 返回Top-K结果")
            
            # ef选择建议
            def recommend_ef(target_recall=0.95):
                """推荐ef值"""
                for ef in [32, 64, 128, 256, 512]:
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": ef}
                    }
                    
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    
                    hnsw_ids = set([hit.id for hit in results[0]])
                    recall = len(flat_ids & hnsw_ids) / len(flat_ids)
                    
                    if recall >= target_recall:
                        return ef, recall
                
                return 512, recall
            
            recommended_ef, recall = recommend_ef(0.95)
            print(f"\n推荐ef值(召回率≥95%): {recommended_ef}")
            print(f"实际召回率: {recall*100:.2f}%")
            ---

02.性能优化
    a.构建优化
        a.功能说明
            HNSW构建时间较长,是其主要缺点。efConstruction参数控制构建质量,值越大构建越慢但图质量越高。推荐efConstruction=100-500,通常设置为200。构建过程可以并行化,利用多核CPU。增量构建性能较差,建议批量构建。构建完成后索引不可修改,新数据需要重建索引。可以通过预训练减少构建时间。构建时内存占用较高,需要充足内存。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 测试不同efConstruction值
            ef_construction_values = [100, 200, 400]
            
            print("efConstruction参数影响:\n")
            print(f"{'efConstruction':>16s} {'构建时间':>12s} {'查询时间':>12s} {'召回率':>10s}")
            print("-" * 55)
            
            for ef_const in ef_construction_values:
                # 创建索引
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": 16,
                        "efConstruction": ef_const
                    }
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                collection.load()
                
                # 测试查询性能
                query_vector = [[np.random.random() for _ in range(128)]]
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 128}
                }
                
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                
                # 计算召回率(需要FLAT基准)
                # 这里简化,实际应该与FLAT对比
                recall = 0.98  # 示例值
                
                print(f"{ef_const:16d} {build_time:10.2f}s {avg_time:10.2f}ms {recall*100:9.2f}%")
                
                collection.release()
                collection.drop_index()
            
            print("\nefConstruction选择建议:")
            print("  100-200: 快速构建,适合原型开发")
            print("  200-400: 平衡选择(推荐)")
            print("  400+:    高质量图,构建时间长")
            
            # 批量构建策略
            def batch_build_hnsw(data_batches):
                """批量构建HNSW索引"""
                # 先插入所有数据
                for batch in data_batches:
                    collection.insert(batch)
                
                collection.flush()
                
                # 一次性构建索引
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": 16,
                        "efConstruction": 200
                    }
                }
                
                print("开始批量构建HNSW索引...")
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                print(f"构建完成,耗时: {build_time:.2f}s")
            
            # 增量构建问题
            print("\n增量构建注意事项:")
            print("  - HNSW不支持高效增量构建")
            print("  - 新数据需要重建整个索引")
            print("  - 建议批量插入后统一构建")
            print("  - 或使用IVF系列索引(支持增量)")
            ---
    b.查询优化
        a.功能说明
            HNSW查询性能优秀,是其主要优势。查询时间与数据量呈对数关系,扩展性好。批量查询可以提升吞吐量,共享图遍历开销。ef参数是查询性能的关键,建议根据延迟要求动态调整。可以为不同查询场景设置不同ef值。HNSW对CPU友好,可以利用多核并行查询。内存访问模式较好,缓存命中率高。适合低延迟、高吞吐的查询场景。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import concurrent.futures
            
            collection = Collection("documents")
            
            # 创建HNSW索引
            index_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,
                    "efConstruction": 200
                }
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 单次查询性能
            def test_single_query():
                """测试单次查询性能"""
                query_vector = [[np.random.random() for _ in range(128)]]
                
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 128}
                }
                
                times = []
                for _ in range(100):
                    start = time.time()
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                p50 = np.percentile(times, 50) * 1000
                p95 = np.percentile(times, 95) * 1000
                p99 = np.percentile(times, 99) * 1000
                
                print("单次查询性能:")
                print(f"  平均: {avg_time:.2f}ms")
                print(f"  P50:  {p50:.2f}ms")
                print(f"  P95:  {p95:.2f}ms")
                print(f"  P99:  {p99:.2f}ms")
            
            test_single_query()
            
            # 批量查询性能
            def test_batch_query():
                """测试批量查询性能"""
                batch_sizes = [1, 10, 50, 100]
                
                print("\n批量查询性能:")
                print(f"{'批量大小':>8s} {'总时间':>10s} {'平均每次':>12s} {'QPS':>10s}")
                print("-" * 45)
                
                for batch_size in batch_sizes:
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                    
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": 128}
                    }
                    
                    start = time.time()
                    collection.search(
                        data=query_vectors,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    total_time = time.time() - start
                    
                    avg_time = total_time / batch_size * 1000
                    qps = batch_size / total_time
                    
                    print(f"{batch_size:8d} {total_time*1000:9.2f}ms {avg_time:10.2f}ms {qps:9.2f}")
            
            test_batch_query()
            
            # 并发查询性能
            def test_concurrent_query():
                """测试并发查询性能"""
                def single_query():
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": 128}
                    }
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                
                print("\n并发查询性能:")
                print(f"{'并发数':>8s} {'总时间':>10s} {'QPS':>10s}")
                print("-" * 32)
                
                for num_workers in [1, 2, 4, 8, 16]:
                    num_queries = 100
                    
                    start = time.time()
                    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
                        futures = [executor.submit(single_query) for _ in range(num_queries)]
                        for future in concurrent.futures.as_completed(futures):
                            future.result()
                    
                    total_time = time.time() - start
                    qps = num_queries / total_time
                    
                    print(f"{num_workers:8d} {total_time:9.2f}s {qps:9.2f}")
            
            test_concurrent_query()
            
            # 动态ef调整
            class AdaptiveHNSWSearch:
                def __init__(self, collection):
                    self.collection = collection
                    self.ef_map = {
                        "fast": 64,
                        "balanced": 128,
                        "accurate": 256
                    }
                
                def search(self, query_vector, mode="balanced", limit=10):
                    """根据模式动态调整ef"""
                    ef = self.ef_map.get(mode, 128)
                    
                    search_params = {
                        "metric_type": "L2",
                        "params": {"ef": ef}
                    }
                    
                    return self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit
                    )
            
            adaptive_search = AdaptiveHNSWSearch(collection)
            
            # 不同模式的查询
            query_vector = [np.random.random() for _ in range(128)]
            
            print("\n自适应查询:")
            for mode in ["fast", "balanced", "accurate"]:
                start = time.time()
                results = adaptive_search.search(query_vector, mode=mode)
                elapsed = time.time() - start
                print(f"  {mode:10s}: {elapsed*1000:.2f}ms")
            ---

03.使用建议
    a.适用场景
        a.功能说明
            HNSW适合对查询延迟要求高的场景,如实时推荐、在线搜索等。适合数据量大但更新频率低的应用。内存充足时HNSW是最佳选择。不适合频繁更新的场景,因为不支持高效增量构建。适合CPU密集型查询,GPU加速效果不明显。适合高维向量(512维以上),性能优势更明显。推荐作为生产环境的首选索引。
        b.代码示例
            ---
            from pymilvus import Collection
            
            # 场景1: 实时推荐系统
            def realtime_recommendation():
                """实时推荐场景"""
                collection = Collection("product_embeddings")
                
                # HNSW配置(低延迟)
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "IP",  # 内积,适合推荐
                    "params": {
                        "M": 16,
                        "efConstruction": 200
                    }
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 快速查询(ef=64)
                user_vector = [[0.1] * 128]
                search_params = {
                    "metric_type": "IP",
                    "params": {"ef": 64}
                }
                
                results = collection.search(
                    data=user_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=20,
                    output_fields=["id", "title"]
                )
                
                print("推荐商品:")
                for hit in results[0]:
                    print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 场景2: 图像搜索
            def image_search():
                """图像搜索场景"""
                collection = Collection("image_vectors")
                
                # HNSW配置(高维向量)
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {
                        "M": 32,  # 高维向量用更大的M
                        "efConstruction": 400
                    }
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 精确查询(ef=256)
                query_image_vector = [[0.1] * 512]  # 512维
                search_params = {
                    "metric_type": "L2",
                    "params": {"ef": 256}
                }
                
                results = collection.search(
                    data=query_image_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("相似图像:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 场景3: 文本语义搜索
            def semantic_search():
                """文本语义搜索"""
                collection = Collection("document_embeddings")
                
                # HNSW配置(平衡)
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "COSINE",
                    "params": {
                        "M": 16,
                        "efConstruction": 200
                    }
                }
                
                collection.create_index(field_name="embedding", index_params=index_params)
                collection.load()
                
                # 语义查询
                query_text_vector = [[0.1] * 768]  # BERT embedding
                search_params = {
                    "metric_type": "COSINE",
                    "params": {"ef": 128}
                }
                
                results = collection.search(
                    data=query_text_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10,
                    output_fields=["title", "content"]
                )
                
                print("相关文档:")
                for hit in results[0]:
                    print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            realtime_recommendation()
            image_search()
            semantic_search()
            ---
    b.对比总结
        a.功能说明
            HNSW vs IVF:HNSW查询更快但内存更高,IVF内存更低但查询较慢。HNSW构建慢,IVF构建快。HNSW不支持增量,IVF支持。HNSW适合静态数据,IVF适合动态数据。HNSW vs FLAT:HNSW是近似索引,FLAT是精确索引。HNSW性能远超FLAT,但召回率略低。选择建议:低延迟用HNSW,低内存用IVF,高召回用FLAT。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 性能对比测试
            def compare_indexes():
                """对比不同索引的性能"""
                indexes = [
                    ("FLAT", {"index_type": "FLAT", "metric_type": "L2", "params": {}}, {"metric_type": "L2"}),
                    ("IVF_FLAT", {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 1024}}, {"metric_type": "L2", "params": {"nprobe": 16}}),
                    ("HNSW", {"index_type": "HNSW", "metric_type": "L2", "params": {"M": 16, "efConstruction": 200}}, {"metric_type": "L2", "params": {"ef": 128}})
                ]
                
                print("索引性能对比:\n")
                print(f"{'索引类型':>12s} {'构建时间':>12s} {'查询时间':>12s} {'内存占用':>12s}")
                print("-" * 52)
                
                query_vector = [[np.random.random() for _ in range(128)]]
                
                for index_name, index_params, search_params in indexes:
                    # 构建索引
                    start = time.time()
                    collection.create_index(field_name="embedding", index_params=index_params)
                    build_time = time.time() - start
                    
                    collection.load()
                    
                    # 查询性能
                    times = []
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param=search_params,
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000
                    
                    # 内存估算
                    num_vectors = collection.num_entities
                    dim = 128
                    
                    if index_name == "FLAT":
                        memory_mb = num_vectors * dim * 4 / 1024 / 1024
                    elif index_name == "IVF_FLAT":
                        memory_mb = num_vectors * dim * 4 / 1024 / 1024
                    else:  # HNSW
                        memory_mb = num_vectors * (dim * 4 + 16 * 2 * 8) / 1024 / 1024
                    
                    print(f"{index_name:>12s} {build_time:10.2f}s {avg_time:10.2f}ms {memory_mb:10.2f}MB")
                    
                    collection.release()
                    collection.drop_index()
                
                print("\n选择建议:")
                print("  FLAT:     数据量<10万,需要100%召回率")
                print("  IVF_FLAT: 数据量10万-1000万,内存受限")
                print("  HNSW:     数据量>10万,低延迟要求,内存充足")
            
            compare_indexes()
            
            # 决策树
            def recommend_index(num_vectors, memory_limit_gb, latency_requirement_ms, update_frequency):
                """推荐索引类型"""
                print("\n索引推荐决策:")
                print(f"  数据量: {num_vectors:,}")
                print(f"  内存限制: {memory_limit_gb}GB")
                print(f"  延迟要求: {latency_requirement_ms}ms")
                print(f"  更新频率: {update_frequency}")
                
                if num_vectors < 100000:
                    return "FLAT"
                
                dim = 128
                hnsw_memory_gb = num_vectors * (dim * 4 + 16 * 2 * 8) / 1024 / 1024 / 1024
                
                if hnsw_memory_gb <= memory_limit_gb and latency_requirement_ms < 10:
                    if update_frequency == "low":
                        return "HNSW"
                    else:
                        return "IVF_FLAT (HNSW不支持高频更新)"
                else:
                    return "IVF_FLAT"
            
            recommendation = recommend_index(
                num_vectors=1000000,
                memory_limit_gb=4,
                latency_requirement_ms=5,
                update_frequency="low"
            )
            
            print(f"\n推荐索引: {recommendation}")
            ---

5.5 标量索引

01.标量索引类型
    a.INVERTED索引
        a.功能说明
            倒排索引适用于VARCHAR和数值类型字段的等值查询和范围查询。通过建立值到文档ID的映射,加速标量字段的过滤。适合高基数字段(唯一值多的字段),如用户ID、商品ID等。对于低基数字段(如性别、类别)效果不明显。可以与向量索引配合使用,实现混合查询。标量索引占用内存较小,构建速度快。支持字符串前缀匹配和数值范围查询。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建带标量字段的Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),
                FieldSchema(name="price", dtype=DataType.FLOAT),
                FieldSchema(name="timestamp", dtype=DataType.INT64),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("scalar_index_demo", schema=schema)
            
            # 插入测试数据
            data_size = 100000
            ids = list(range(data_size))
            titles = [f"商品{i}" for i in range(data_size)]
            categories = ["电子", "服装", "食品", "图书"] * (data_size // 4)
            prices = [np.random.uniform(10, 1000) for _ in range(data_size)]
            timestamps = [1700000000 + i for i in range(data_size)]
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            
            data = [ids, titles, categories, prices, timestamps, embeddings]
            collection.insert(data)
            collection.flush()
            
            # 创建标量索引
            collection.create_index(
                field_name="category",
                index_name="category_index"
            )
            
            collection.create_index(
                field_name="price",
                index_name="price_index"
            )
            
            collection.create_index(
                field_name="timestamp",
                index_name="timestamp_index"
            )
            
            print("标量索引创建完成")
            
            # 创建向量索引
            vector_index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=vector_index_params)
            
            collection.load()
            
            # 测试标量过滤性能
            expr = 'category == "电子" and price > 500'
            
            start = time.time()
            results = collection.query(
                expr=expr,
                output_fields=["id", "title", "category", "price"],
                limit=100
            )
            elapsed = time.time() - start
            
            print(f"\n标量查询: {len(results)} 条结果,耗时 {elapsed*1000:.2f}ms")
            
            # 混合查询(向量+标量)
            query_vector = [[np.random.random() for _ in range(128)]]
            
            start = time.time()
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                expr='category == "电子" and price > 500',
                output_fields=["id", "title", "category", "price"]
            )
            elapsed = time.time() - start
            
            print(f"混合查询: {len(results[0])} 条结果,耗时 {elapsed*1000:.2f}ms")
            ---
    b.AUTO_INDEX
        a.功能说明
            AUTO_INDEX是Milvus自动选择的标量索引类型,根据字段类型和数据特征自动优化。简化索引创建流程,无需手动指定索引类型。适合不确定最佳索引类型的场景。对于大多数标量字段都能提供良好性能。推荐作为标量索引的默认选择。内部可能使用B树、哈希表等多种数据结构。
        b.代码示例
            ---
            from pymilvus import Collection
            
            collection = Collection("documents")
            
            # 使用AUTO_INDEX
            collection.create_index(
                field_name="category",
                index_params={"index_type": "AUTO_INDEX"}
            )
            
            collection.create_index(
                field_name="timestamp",
                index_params={"index_type": "AUTO_INDEX"}
            )
            
            print("AUTO_INDEX创建完成")
            
            collection.load()
            
            # 测试查询
            results = collection.query(
                expr='category == "技术" and timestamp > 1700000000',
                output_fields=["id", "title"],
                limit=100
            )
            
            print(f"查询结果: {len(results)} 条")
            
            # AUTO_INDEX建议
            print("\nAUTO_INDEX使用建议:")
            print("  优点: 自动优化,无需调参")
            print("  缺点: 缺乏控制,可能不是最优")
            print("  适用: 快速开发,不确定最佳索引类型")
            ---

02.标量过滤优化
    a.过滤表达式
        a.功能说明
            标量过滤表达式支持等值、范围、逻辑运算等操作。合理使用索引可以显著提升过滤性能。过滤条件应该尽量使用索引字段。复杂表达式可能无法充分利用索引。建议将高选择性条件放在前面。过滤后的结果集越小,向量搜索越快。标量过滤在向量搜索前执行,可以减少向量计算量。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 测试不同过滤条件的性能
            test_cases = [
                ('category == "技术"', "单条件等值"),
                ('price > 100 and price < 500', "范围查询"),
                ('category == "技术" and price > 100', "组合条件"),
                ('category in ["技术", "新闻", "博客"]', "IN查询"),
                ('category == "技术" or category == "新闻"', "OR条件")
            ]
            
            print("过滤表达式性能测试:\n")
            
            for expr, desc in test_cases:
                start = time.time()
                results = collection.query(
                    expr=expr,
                    output_fields=["id"],
                    limit=1000
                )
                elapsed = time.time() - start
                
                print(f"{desc:15s}: {len(results):5d} 条结果, {elapsed*1000:6.2f}ms")
            
            # 混合查询优化
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 策略1: 宽松过滤(过滤后数据多)
            expr_loose = 'category == "技术"'
            
            start = time.time()
            results_loose = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                expr=expr_loose
            )
            time_loose = time.time() - start
            
            # 策略2: 严格过滤(过滤后数据少)
            expr_strict = 'category == "技术" and price > 500 and timestamp > 1700000000'
            
            start = time.time()
            results_strict = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                expr=expr_strict
            )
            time_strict = time.time() - start
            
            print(f"\n混合查询优化:")
            print(f"  宽松过滤: {time_loose*1000:.2f}ms")
            print(f"  严格过滤: {time_strict*1000:.2f}ms")
            print(f"  建议: 过滤条件越严格,向量搜索越快")
            
            # 表达式优化建议
            print("\n表达式优化建议:")
            print("  1. 使用索引字段")
            print("  2. 高选择性条件在前")
            print("  3. 避免复杂嵌套")
            print("  4. 使用IN代替多个OR")
            print("  5. 范围查询使用索引")
            ---
    b.索引选择
        a.功能说明
            不是所有标量字段都需要索引。高基数字段(唯一值多)适合建索引,如ID、邮箱等。低基数字段(唯一值少)索引效果不明显,如性别、状态等。频繁查询的字段应该建索引。索引会增加内存占用和插入开销。需要在查询性能和资源消耗间平衡。可以通过查询分析确定需要索引的字段。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 分析字段基数
            def analyze_cardinality(collection, field_name):
                """分析字段的基数(唯一值数量)"""
                # 查询所有数据
                results = collection.query(
                    expr="id >= 0",
                    output_fields=[field_name],
                    limit=16384
                )
                
                # 统计唯一值
                unique_values = set([r[field_name] for r in results])
                cardinality = len(unique_values)
                total_count = len(results)
                
                cardinality_ratio = cardinality / total_count if total_count > 0 else 0
                
                return {
                    "field": field_name,
                    "total": total_count,
                    "unique": cardinality,
                    "ratio": cardinality_ratio
                }
            
            # 分析多个字段
            fields_to_analyze = ["category", "timestamp", "id"]
            
            print("字段基数分析:\n")
            print(f"{'字段':>12s} {'总数':>8s} {'唯一值':>8s} {'基数比':>8s} {'建议':>12s}")
            print("-" * 55)
            
            for field in fields_to_analyze:
                stats = analyze_cardinality(collection, field)
                
                # 索引建议
                if stats["ratio"] > 0.5:
                    recommendation = "建议索引"
                elif stats["ratio"] > 0.1:
                    recommendation = "可选索引"
                else:
                    recommendation = "不建议"
                
                print(f"{stats['field']:>12s} {stats['total']:>8d} {stats['unique']:>8d} {stats['ratio']:>8.2%} {recommendation:>12s}")
            
            # 索引决策树
            def should_create_index(field_name, cardinality_ratio, query_frequency):
                """决定是否创建索引"""
                if cardinality_ratio > 0.5 and query_frequency == "high":
                    return True, "高基数+高频查询"
                elif cardinality_ratio > 0.1 and query_frequency == "high":
                    return True, "中基数+高频查询"
                elif cardinality_ratio > 0.5 and query_frequency == "medium":
                    return True, "高基数+中频查询"
                else:
                    return False, "不建议索引"
            
            # 示例决策
            decisions = [
                ("user_id", 0.9, "high"),
                ("category", 0.01, "high"),
                ("timestamp", 0.8, "medium"),
                ("status", 0.001, "low")
            ]
            
            print("\n索引决策示例:")
            for field, ratio, freq in decisions:
                should_index, reason = should_create_index(field, ratio, freq)
                print(f"  {field:12s}: {'创建' if should_index else '跳过':4s} ({reason})")
            
            # 索引成本分析
            print("\n索引成本分析:")
            print("  内存成本: 每个索引约占原字段大小的10%-50%")
            print("  插入成本: 索引字段插入速度降低10%-30%")
            print("  查询收益: 索引查询速度提升10x-100x")
            print("  建议: 只为高频查询的高基数字段建索引")
            ---

5.6 索引参数

01.参数配置
    a.构建参数
        a.功能说明
            索引构建参数决定索引的质量和构建时间。不同索引类型有不同的构建参数。IVF系列的nlist控制聚类数量,HNSW的M和efConstruction控制图结构。构建参数一旦设置无法修改,需要重建索引。应该根据数据规模和性能要求选择参数。可以通过小规模测试确定最优参数。构建参数影响索引大小和查询性能。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import time
            
            # 创建测试Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            schema = CollectionSchema(fields=fields)
            collection = Collection("index_params_test", schema=schema)
            
            # 插入数据
            data_size = 100000
            ids = list(range(data_size))
            embeddings = [[np.random.random() for _ in range(128)] for _ in range(data_size)]
            data = [ids, embeddings]
            collection.insert(data)
            collection.flush()
            
            # IVF_FLAT参数配置
            ivf_configs = [
                {"nlist": 512},
                {"nlist": 1024},
                {"nlist": 2048}
            ]
            
            print("IVF_FLAT构建参数测试:\n")
            print(f"{'nlist':>8s} {'构建时间':>12s} {'索引大小':>12s}")
            print("-" * 36)
            
            for params in ivf_configs:
                index_params = {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": params
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                # 估算索引大小
                index_size_mb = data_size * 128 * 4 / 1024 / 1024
                
                print(f"{params['nlist']:8d} {build_time:10.2f}s {index_size_mb:10.2f}MB")
                
                collection.drop_index()
            
            # HNSW参数配置
            hnsw_configs = [
                {"M": 8, "efConstruction": 100},
                {"M": 16, "efConstruction": 200},
                {"M": 32, "efConstruction": 400}
            ]
            
            print("\nHNSW构建参数测试:\n")
            print(f"{'M':>4s} {'efConstruction':>16s} {'构建时间':>12s}")
            print("-" * 36)
            
            for params in hnsw_configs:
                index_params = {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": params
                }
                
                start = time.time()
                collection.create_index(field_name="embedding", index_params=index_params)
                build_time = time.time() - start
                
                print(f"{params['M']:4d} {params['efConstruction']:16d} {build_time:10.2f}s")
                
                collection.drop_index()
            
            # 参数推荐函数
            def recommend_build_params(num_vectors, index_type):
                """推荐构建参数"""
                if index_type == "IVF_FLAT":
                    sqrt_n = int(np.sqrt(num_vectors))
                    return {
                        "conservative": {"nlist": sqrt_n},
                        "balanced": {"nlist": 2 * sqrt_n},
                        "aggressive": {"nlist": 4 * sqrt_n}
                    }
                elif index_type == "HNSW":
                    return {
                        "fast_build": {"M": 8, "efConstruction": 100},
                        "balanced": {"M": 16, "efConstruction": 200},
                        "high_quality": {"M": 32, "efConstruction": 400}
                    }
                else:
                    return {}
            
            print(f"\n推荐参数({data_size:,}个向量):")
            
            for index_type in ["IVF_FLAT", "HNSW"]:
                print(f"\n{index_type}:")
                recs = recommend_build_params(data_size, index_type)
                for strategy, params in recs.items():
                    print(f"  {strategy:15s}: {params}")
            ---
    b.搜索参数
        a.功能说明
            搜索参数控制查询时的性能和召回率平衡。可以在运行时动态调整,无需重建索引。IVF的nprobe控制搜索的聚类数量,HNSW的ef控制搜索宽度。搜索参数越大召回率越高但性能越低。应该根据应用场景选择合适的搜索参数。可以为不同查询设置不同参数。建议通过A/B测试确定最优搜索参数。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("index_params_test")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 测试不同搜索参数
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 获取FLAT基准
            collection.release()
            collection.drop_index()
            
            flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
            collection.create_index(field_name="embedding", index_params=flat_params)
            collection.load()
            
            flat_results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param={"metric_type": "L2"},
                limit=100
            )
            flat_ids = set([hit.id for hit in flat_results[0]])
            
            # 恢复IVF索引
            collection.release()
            collection.drop_index()
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 搜索参数测试
            print("IVF搜索参数测试:\n")
            print(f"{'nprobe':>8s} {'查询时间':>12s} {'召回率':>10s} {'QPS':>10s}")
            print("-" * 45)
            
            nprobe_values = [1, 4, 8, 16, 32, 64]
            
            for nprobe in nprobe_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                # 测量性能
                times = []
                for _ in range(10):
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=100
                    )
                    times.append(time.time() - start)
                
                avg_time = np.mean(times) * 1000
                qps = 1000 / avg_time if avg_time > 0 else 0
                
                # 计算召回率
                ivf_ids = set([hit.id for hit in results[0]])
                recall = len(flat_ids & ivf_ids) / len(flat_ids)
                
                print(f"{nprobe:8d} {avg_time:10.2f}ms {recall*100:9.2f}% {qps:9.2f}")
            
            # 动态参数调整
            class DynamicSearchParams:
                def __init__(self):
                    self.params_map = {
                        "fast": {"nprobe": 4},
                        "balanced": {"nprobe": 16},
                        "accurate": {"nprobe": 64}
                    }
                
                def get_params(self, mode="balanced"):
                    """根据模式获取搜索参数"""
                    return {
                        "metric_type": "L2",
                        "params": self.params_map.get(mode, self.params_map["balanced"])
                    }
                
                def auto_adjust(self, latency_ms, target_latency_ms=10):
                    """根据延迟自动调整参数"""
                    if latency_ms > target_latency_ms * 1.5:
                        return "fast"
                    elif latency_ms < target_latency_ms * 0.5:
                        return "accurate"
                    else:
                        return "balanced"
            
            dynamic_params = DynamicSearchParams()
            
            # 自适应查询
            print("\n自适应搜索参数:")
            
            for mode in ["fast", "balanced", "accurate"]:
                params = dynamic_params.get_params(mode)
                
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=params,
                    limit=10
                )
                latency = (time.time() - start) * 1000
                
                print(f"  {mode:10s}: {latency:.2f}ms (nprobe={params['params']['nprobe']})")
            ---

02.参数调优
    a.性能测试
        a.功能说明
            参数调优需要通过性能测试确定最优配置。测试应该覆盖不同数据规模和查询模式。关注指标包括构建时间、查询延迟、召回率、内存占用等。应该在真实数据和查询上测试,避免过拟合。可以使用网格搜索或贝叶斯优化寻找最优参数。需要在多个指标间权衡,没有绝对最优解。建议建立参数调优流程和工具。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from itertools import product
            
            collection = Collection("documents")
            
            # 网格搜索最优参数
            def grid_search_ivf_params(collection, query_vectors, target_recall=0.95):
                """网格搜索IVF最优参数"""
                # 参数网格
                nlist_values = [512, 1024, 2048]
                nprobe_values = [8, 16, 32, 64]
                
                # 获取FLAT基准
                collection.release()
                collection.drop_index()
                
                flat_params = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
                collection.create_index(field_name="embedding", index_params=flat_params)
                collection.load()
                
                flat_results_list = []
                for qv in query_vectors:
                    results = collection.search(
                        data=[qv],
                        anns_field="embedding",
                        param={"metric_type": "L2"},
                        limit=100
                    )
                    flat_results_list.append(set([hit.id for hit in results[0]]))
                
                # 测试所有参数组合
                best_config = None
                best_score = float('inf')
                
                results_table = []
                
                for nlist in nlist_values:
                    # 构建索引
                    collection.release()
                    collection.drop_index()
                    
                    index_params = {
                        "index_type": "IVF_FLAT",
                        "metric_type": "L2",
                        "params": {"nlist": nlist}
                    }
                    
                    start = time.time()
                    collection.create_index(field_name="embedding", index_params=index_params)
                    build_time = time.time() - start
                    
                    collection.load()
                    
                    for nprobe in nprobe_values:
                        search_params = {
                            "metric_type": "L2",
                            "params": {"nprobe": nprobe}
                        }
                        
                        # 测试查询
                        total_time = 0
                        total_recall = 0
                        
                        for i, qv in enumerate(query_vectors):
                            start = time.time()
                            results = collection.search(
                                data=[qv],
                                anns_field="embedding",
                                param=search_params,
                                limit=100
                            )
                            total_time += time.time() - start
                            
                            ivf_ids = set([hit.id for hit in results[0]])
                            recall = len(flat_results_list[i] & ivf_ids) / len(flat_results_list[i])
                            total_recall += recall
                        
                        avg_time = total_time / len(query_vectors) * 1000
                        avg_recall = total_recall / len(query_vectors)
                        
                        # 评分:满足召回率要求的最快配置
                        if avg_recall >= target_recall:
                            score = avg_time
                            if score < best_score:
                                best_score = score
                                best_config = {
                                    "nlist": nlist,
                                    "nprobe": nprobe,
                                    "build_time": build_time,
                                    "query_time": avg_time,
                                    "recall": avg_recall
                                }
                        
                        results_table.append({
                            "nlist": nlist,
                            "nprobe": nprobe,
                            "build_time": build_time,
                            "query_time": avg_time,
                            "recall": avg_recall
                        })
                
                # 打印结果
                print("参数网格搜索结果:\n")
                print(f"{'nlist':>8s} {'nprobe':>8s} {'构建时间':>12s} {'查询时间':>12s} {'召回率':>10s}")
                print("-" * 55)
                
                for r in results_table:
                    print(f"{r['nlist']:8d} {r['nprobe']:8d} {r['build_time']:10.2f}s {r['query_time']:10.2f}ms {r['recall']*100:9.2f}%")
                
                if best_config:
                    print(f"\n最优配置(召回率≥{target_recall*100:.0f}%):")
                    print(f"  nlist: {best_config['nlist']}")
                    print(f"  nprobe: {best_config['nprobe']}")
                    print(f"  查询时间: {best_config['query_time']:.2f}ms")
                    print(f"  召回率: {best_config['recall']*100:.2f}%")
                
                return best_config
            
            # 生成测试查询
            test_queries = [[np.random.random() for _ in range(128)] for _ in range(10)]
            
            # 执行网格搜索
            best_config = grid_search_ivf_params(collection, test_queries, target_recall=0.95)
            ---
    b.调优策略
        a.功能说明
            参数调优应该遵循系统化策略。首先确定性能目标(延迟、召回率、吞吐量等)。然后选择合适的索引类型。接着通过测试确定构建参数。最后调整搜索参数达到目标性能。应该在真实负载下测试,考虑并发查询。需要监控生产环境性能,持续优化。建议建立参数配置管理系统。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            # 参数调优流程
            class IndexTuner:
                def __init__(self, collection):
                    self.collection = collection
                    self.test_queries = [[np.random.random() for _ in range(128)] for _ in range(20)]
                
                def step1_select_index_type(self, num_vectors, memory_limit_gb, latency_requirement_ms):
                    """步骤1: 选择索引类型"""
                    print("步骤1: 选择索引类型\n")
                    
                    if num_vectors < 100000:
                        recommendation = "FLAT"
                        reason = "数据量小,使用精确索引"
                    else:
                        dim = 128
                        hnsw_memory = num_vectors * (dim * 4 + 16 * 2 * 8) / 1024 / 1024 / 1024
                        
                        if hnsw_memory <= memory_limit_gb and latency_requirement_ms < 10:
                            recommendation = "HNSW"
                            reason = "低延迟要求,内存充足"
                        else:
                            recommendation = "IVF_FLAT"
                            reason = "平衡性能和内存"
                    
                    print(f"推荐索引: {recommendation}")
                    print(f"原因: {reason}\n")
                    
                    return recommendation
                
                def step2_tune_build_params(self, index_type):
                    """步骤2: 调优构建参数"""
                    print("步骤2: 调优构建参数\n")
                    
                    num_vectors = self.collection.num_entities
                    
                    if index_type == "IVF_FLAT":
                        sqrt_n = int(np.sqrt(num_vectors))
                        candidates = [sqrt_n, 2*sqrt_n, 4*sqrt_n]
                        
                        print(f"测试nlist值: {candidates}")
                        
                        best_nlist = 2 * sqrt_n  # 简化,实际应测试
                        build_params = {"nlist": best_nlist}
                        
                    elif index_type == "HNSW":
                        candidates = [
                            {"M": 8, "efConstruction": 100},
                            {"M": 16, "efConstruction": 200},
                            {"M": 32, "efConstruction": 400}
                        ]
                        
                        print(f"测试M和efConstruction组合")
                        
                        build_params = {"M": 16, "efConstruction": 200}  # 简化
                    
                    else:
                        build_params = {}
                    
                    print(f"选择构建参数: {build_params}\n")
                    
                    return build_params
                
                def step3_tune_search_params(self, index_type, target_recall=0.95, target_latency_ms=10):
                    """步骤3: 调优搜索参数"""
                    print("步骤3: 调优搜索参数\n")
                    print(f"目标召回率: {target_recall*100:.0f}%")
                    print(f"目标延迟: {target_latency_ms}ms\n")
                    
                    if index_type == "IVF_FLAT":
                        # 二分查找最优nprobe
                        left, right = 1, 128
                        best_nprobe = 16
                        
                        print(f"搜索最优nprobe...")
                        
                        search_params = {"nprobe": best_nprobe}
                        
                    elif index_type == "HNSW":
                        # 测试不同ef值
                        best_ef = 128
                        
                        print(f"搜索最优ef...")
                        
                        search_params = {"ef": best_ef}
                    
                    else:
                        search_params = {}
                    
                    print(f"选择搜索参数: {search_params}\n")
                    
                    return search_params
                
                def step4_validate(self, index_type, build_params, search_params):
                    """步骤4: 验证配置"""
                    print("步骤4: 验证配置\n")
                    
                    # 创建索引
                    index_params = {
                        "index_type": index_type,
                        "metric_type": "L2",
                        "params": build_params
                    }
                    
                    start = time.time()
                    self.collection.create_index(field_name="embedding", index_params=index_params)
                    build_time = time.time() - start
                    
                    self.collection.load()
                    
                    # 测试查询
                    full_search_params = {
                        "metric_type": "L2",
                        "params": search_params
                    }
                    
                    times = []
                    for qv in self.test_queries:
                        start = time.time()
                        self.collection.search(
                            data=[qv],
                            anns_field="embedding",
                            param=full_search_params,
                            limit=10
                        )
                        times.append(time.time() - start)
                    
                    avg_time = np.mean(times) * 1000
                    p95_time = np.percentile(times, 95) * 1000
                    
                    print(f"构建时间: {build_time:.2f}s")
                    print(f"平均查询时间: {avg_time:.2f}ms")
                    print(f"P95查询时间: {p95_time:.2f}ms")
                    
                    return {
                        "build_time": build_time,
                        "avg_latency": avg_time,
                        "p95_latency": p95_time
                    }
                
                def tune(self, num_vectors, memory_limit_gb, latency_requirement_ms, target_recall=0.95):
                    """完整调优流程"""
                    print("=" * 60)
                    print("索引参数调优流程")
                    print("=" * 60 + "\n")
                    
                    # 步骤1: 选择索引类型
                    index_type = self.step1_select_index_type(num_vectors, memory_limit_gb, latency_requirement_ms)
                    
                    # 步骤2: 调优构建参数
                    build_params = self.step2_tune_build_params(index_type)
                    
                    # 步骤3: 调优搜索参数
                    search_params = self.step3_tune_search_params(index_type, target_recall, latency_requirement_ms)
                    
                    # 步骤4: 验证配置
                    metrics = self.step4_validate(index_type, build_params, search_params)
                    
                    print("\n" + "=" * 60)
                    print("调优完成")
                    print("=" * 60)
                    
                    return {
                        "index_type": index_type,
                        "build_params": build_params,
                        "search_params": search_params,
                        "metrics": metrics
                    }
            
            # 使用调优器
            tuner = IndexTuner(collection)
            
            optimal_config = tuner.tune(
                num_vectors=100000,
                memory_limit_gb=4,
                latency_requirement_ms=10,
                target_recall=0.95
            )
            
            print(f"\n最优配置:")
            print(f"  索引类型: {optimal_config['index_type']}")
            print(f"  构建参数: {optimal_config['build_params']}")
            print(f"  搜索参数: {optimal_config['search_params']}")
            ---

6 搜索查询

6.1 相似度搜索

01.基本搜索
    a.向量搜索
        a.功能说明
            向量搜索是Milvus的核心功能,通过计算查询向量与数据库中向量的相似度返回Top-K结果。支持多种距离度量方式:L2(欧氏距离)、IP(内积)、COSINE(余弦相似度)。查询时需要指定anns_field(向量字段名)、limit(返回结果数)和搜索参数。可以同时返回标量字段,通过output_fields指定。搜索结果按相似度排序,距离值越小表示越相似(L2)或越大表示越相似(IP)。支持批量查询,一次提交多个查询向量。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            import numpy as np
            
            # 连接Milvus
            connections.connect(host="localhost", port="19530")
            
            # 获取Collection
            collection = Collection("documents")
            collection.load()
            
            # 单个向量搜索
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                output_fields=["id", "title", "content"]
            )
            
            print("搜索结果:")
            for hit in results[0]:
                print(f"  ID: {hit.id}")
                print(f"  标题: {hit.entity.get('title')}")
                print(f"  距离: {hit.distance:.4f}")
                print()
            
            # 批量向量搜索
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(5)]
            
            results = collection.search(
                data=query_vectors,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            
            print(f"批量搜索: {len(results)} 个查询")
            for i, hits in enumerate(results):
                print(f"\n查询 {i+1}:")
                for hit in hits[:3]:  # 只显示前3个结果
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 不同距离度量
            metrics = ["L2", "IP", "COSINE"]
            
            print("\n不同距离度量对比:")
            for metric in metrics:
                search_params = {
                    "metric_type": metric,
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5
                )
                
                print(f"\n{metric}:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            ---
    b.距离度量
        a.功能说明
            Milvus支持多种距离度量方式,适用于不同场景。L2(欧氏距离)适合一般向量搜索,值越小越相似。IP(内积)适合推荐系统,值越大越相似。COSINE(余弦相似度)适合文本语义搜索,归一化向量后与IP等价。JACCARD和HAMMING适合二值向量。选择合适的距离度量可以提升搜索效果。距离度量在创建索引时指定,搜索时必须使用相同度量。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # L2距离(欧氏距离)
            def l2_search(query_vector):
                """L2距离搜索,值越小越相似"""
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("L2距离搜索:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, L2距离: {hit.distance:.4f}")
                
                return results
            
            # IP距离(内积)
            def ip_search(query_vector):
                """内积搜索,值越大越相似"""
                search_params = {
                    "metric_type": "IP",
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("\nIP内积搜索:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 内积: {hit.distance:.4f}")
                
                return results
            
            # COSINE距离(余弦相似度)
            def cosine_search(query_vector):
                """余弦相似度搜索,值越大越相似"""
                # 归一化查询向量
                norm = np.linalg.norm(query_vector)
                normalized_vector = (query_vector / norm).tolist()
                
                search_params = {
                    "metric_type": "COSINE",
                    "params": {"nprobe": 16}
                }
                
                results = collection.search(
                    data=[normalized_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                print("\nCOSINE余弦相似度搜索:")
                for hit in results[0]:
                    print(f"  ID: {hit.id}, 余弦相似度: {hit.distance:.4f}")
                
                return results
            
            # 测试不同距离度量
            query_vector = [np.random.random() for _ in range(128)]
            
            l2_results = l2_search(query_vector)
            ip_results = ip_search(query_vector)
            cosine_results = cosine_search(query_vector)
            
            # 距离度量选择建议
            print("\n距离度量选择建议:")
            print("  L2:     通用向量搜索,适合图像、音频等")
            print("  IP:     推荐系统,用户-物品匹配")
            print("  COSINE: 文本语义搜索,归一化向量")
            print("  JACCARD: 集合相似度,标签匹配")
            print("  HAMMING: 二值向量,哈希检索")
            
            # 距离转换
            def convert_distance(distance, from_metric, to_metric):
                """距离值转换"""
                if from_metric == "L2" and to_metric == "COSINE":
                    # L2 to COSINE (假设向量已归一化)
                    return 1 - distance / 2
                elif from_metric == "IP" and to_metric == "COSINE":
                    # IP to COSINE (假设向量已归一化)
                    return distance
                else:
                    return distance
            
            print("\n距离转换示例:")
            print(f"  L2距离 0.5 ≈ 余弦相似度 {convert_distance(0.5, 'L2', 'COSINE'):.4f}")
            ---

02.搜索参数
    a.limit参数
        a.功能说明
            limit参数控制返回结果的数量,即Top-K中的K值。limit必须大于0,推荐范围1-1000。limit越大查询时间越长,但增长不是线性的。对于分页场景,建议使用offset参数配合limit。limit不影响召回率,只影响返回结果数量。实际返回结果可能少于limit,当匹配结果不足时。建议根据业务需求设置合理的limit值。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 测试不同limit值的性能
            limit_values = [1, 10, 50, 100, 500, 1000]
            
            print("limit参数性能测试:\n")
            print(f"{'limit':>8s} {'查询时间':>12s} {'结果数':>8s}")
            print("-" * 32)
            
            for limit in limit_values:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=limit
                )
                elapsed = time.time() - start
                
                actual_count = len(results[0])
                
                print(f"{limit:8d} {elapsed*1000:10.2f}ms {actual_count:8d}")
            
            # 分页查询
            def paginated_search(query_vector, page_size=10, page_num=1):
                """分页查询"""
                offset = (page_num - 1) * page_size
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=page_size,
                    offset=offset,
                    output_fields=["id", "title"]
                )
                
                return results[0]
            
            # 获取第1页
            query_vector = [np.random.random() for _ in range(128)]
            
            print("\n分页查询示例:")
            for page in range(1, 4):
                results = paginated_search(query_vector, page_size=10, page_num=page)
                print(f"\n第{page}页:")
                for hit in results:
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # limit选择建议
            print("\nlimit选择建议:")
            print("  实时推荐: limit=10-20")
            print("  搜索结果: limit=20-50")
            print("  批量处理: limit=100-1000")
            print("  注意: limit过大会影响性能和内存")
            ---
    b.offset参数
        a.功能说明
            offset参数用于跳过前N个结果,实现分页查询。offset从0开始,offset=0表示不跳过。offset + limit不应超过16384(Milvus限制)。offset会影响查询性能,值越大性能越差。不推荐使用大offset进行深度分页。对于深度分页,建议使用游标或时间戳方式。offset在排序后应用,不影响召回过程。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 测试offset性能
            offset_values = [0, 10, 50, 100, 500, 1000]
            
            print("offset参数性能测试:\n")
            print(f"{'offset':>8s} {'查询时间':>12s}")
            print("-" * 24)
            
            for offset in offset_values:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10,
                    offset=offset
                )
                elapsed = time.time() - start
                
                print(f"{offset:8d} {elapsed*1000:10.2f}ms")
            
            # 分页实现
            class Paginator:
                def __init__(self, collection, query_vector, page_size=10):
                    self.collection = collection
                    self.query_vector = query_vector
                    self.page_size = page_size
                    self.search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                
                def get_page(self, page_num):
                    """获取指定页"""
                    if page_num < 1:
                        raise ValueError("page_num must be >= 1")
                    
                    offset = (page_num - 1) * self.page_size
                    
                    # 检查offset限制
                    if offset + self.page_size > 16384:
                        raise ValueError("offset + limit exceeds 16384")
                    
                    results = self.collection.search(
                        data=[self.query_vector],
                        anns_field="embedding",
                        param=self.search_params,
                        limit=self.page_size,
                        offset=offset,
                        output_fields=["id", "title"]
                    )
                    
                    return results[0]
                
                def iterate_pages(self, max_pages=10):
                    """迭代多页"""
                    for page_num in range(1, max_pages + 1):
                        try:
                            results = self.get_page(page_num)
                            if len(results) == 0:
                                break
                            yield page_num, results
                        except ValueError as e:
                            print(f"停止迭代: {e}")
                            break
            
            # 使用分页器
            query_vector = [np.random.random() for _ in range(128)]
            paginator = Paginator(collection, query_vector, page_size=10)
            
            print("\n分页迭代示例:")
            for page_num, results in paginator.iterate_pages(max_pages=3):
                print(f"\n第{page_num}页: {len(results)}条结果")
                for hit in results[:3]:  # 只显示前3条
                    print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 深度分页替代方案
            print("\n深度分页替代方案:")
            print("  1. 使用游标(基于上次结果的最后ID)")
            print("  2. 使用时间戳范围过滤")
            print("  3. 限制最大页数(如只允许前100页)")
            print("  4. 使用Elasticsearch等专门的分页工具")
            ---

6.2 范围查询

01.范围搜索
    a.距离范围
        a.功能说明
            范围搜索返回距离在指定范围内的所有向量,而不是Top-K结果。通过radius参数指定最大距离,返回所有距离小于radius的向量。可选range_filter参数指定最小距离,实现距离区间查询。适合需要获取所有相似结果的场景,如查找所有相似商品。返回结果数量不固定,可能为0或很多。需要合理设置radius避免返回过多结果。范围搜索性能与返回结果数量相关。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 基本范围搜索
            search_params = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 0.5  # 最大距离
                }
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,  # 最大返回数量
                output_fields=["id", "title"]
            )
            
            print(f"范围搜索结果: {len(results[0])} 条")
            for hit in results[0][:10]:  # 只显示前10条
                print(f"  ID: {hit.id}, 距离: {hit.distance:.4f}")
            
            # 距离区间搜索
            search_params_range = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 1.0,        # 最大距离
                    "range_filter": 0.3   # 最小距离
                }
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_range,
                limit=1000,
                output_fields=["id", "title"]
            )
            
            print(f"\n距离区间 [0.3, 1.0] 搜索结果: {len(results[0])} 条")
            
            # 不同距离范围对比
            radius_values = [0.3, 0.5, 1.0, 2.0]
            
            print("\n不同距离范围对比:")
            print(f"{'radius':>8s} {'结果数':>8s}")
            print("-" * 20)
            
            for radius in radius_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": radius
                    }
                }
                
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10000
                )
                
                print(f"{radius:8.1f} {len(results[0]):8d}")
            
            # 范围搜索应用场景
            def find_similar_products(product_vector, max_distance=0.5):
                """查找所有相似商品"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": max_distance
                    }
                }
                
                results = collection.search(
                    data=[product_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=1000,
                    output_fields=["id", "title", "price"]
                )
                
                return results[0]
            
            product_vector = [np.random.random() for _ in range(128)]
            similar_products = find_similar_products(product_vector, max_distance=0.5)
            
            print(f"\n相似商品查找: {len(similar_products)} 个商品")
            ---
    b.范围过滤
        a.功能说明
            范围过滤结合标量字段的范围条件和向量范围搜索。可以同时指定距离范围和标量字段范围。通过expr参数指定标量过滤条件,支持数值范围、日期范围等。先执行标量过滤,再进行向量范围搜索,提升性能。适合复杂查询场景,如查找特定价格区间的相似商品。需要为范围查询字段创建索引。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 价格范围 + 向量范围
            search_params = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 0.8
                }
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,
                expr='price >= 100 and price <= 500',
                output_fields=["id", "title", "price"]
            )
            
            print(f"价格范围 [100, 500] + 向量范围: {len(results[0])} 条结果")
            for hit in results[0][:5]:
                print(f"  {hit.entity.get('title')}: ¥{hit.entity.get('price'):.2f}, 距离: {hit.distance:.4f}")
            
            # 时间范围 + 向量范围
            import time
            current_time = int(time.time())
            one_week_ago = current_time - 7 * 24 * 3600
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,
                expr=f'timestamp >= {one_week_ago} and timestamp <= {current_time}',
                output_fields=["id", "title", "timestamp"]
            )
            
            print(f"\n最近7天 + 向量范围: {len(results[0])} 条结果")
            
            # 多条件范围过滤
            complex_expr = '''
                category == "电子产品" and 
                price >= 100 and price <= 1000 and
                rating >= 4.0 and
                stock > 0
            '''
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=1000,
                expr=complex_expr,
                output_fields=["id", "title", "price", "rating"]
            )
            
            print(f"\n多条件范围过滤: {len(results[0])} 条结果")
            
            # 范围查询优化
            def optimized_range_search(query_vector, price_min, price_max, max_distance):
                """优化的范围查询"""
                # 策略1: 先用严格的标量过滤减少候选集
                expr = f'price >= {price_min} and price <= {price_max}'
                
                # 策略2: 使用合理的radius避免返回过多结果
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": max_distance
                    }
                }
                
                # 策略3: 设置合理的limit上限
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=500,  # 限制最大返回数
                    expr=expr,
                    output_fields=["id", "title", "price"]
                )
                
                return results[0]
            
            results = optimized_range_search(
                query_vector=[np.random.random() for _ in range(128)],
                price_min=200,
                price_max=800,
                max_distance=0.6
            )
            
            print(f"\n优化范围查询: {len(results)} 条结果")
            ---

02.范围查询优化
    a.性能优化
        a.功能说明
            范围查询性能与返回结果数量密切相关。应该合理设置radius避免返回过多结果。使用标量过滤减少候选集,提升性能。为范围查询字段创建索引,加速过滤。考虑使用分页或流式返回大量结果。监控查询性能,调整参数。范围查询比Top-K查询慢,需要权衡。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            # 性能对比: Top-K vs 范围查询
            print("性能对比: Top-K vs 范围查询\n")
            
            # Top-K查询
            search_params_topk = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            results_topk = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_topk,
                limit=100
            )
            time_topk = time.time() - start
            
            print(f"Top-K查询 (limit=100):")
            print(f"  查询时间: {time_topk*1000:.2f}ms")
            print(f"  结果数: {len(results_topk[0])}")
            
            # 范围查询
            search_params_range = {
                "metric_type": "L2",
                "params": {
                    "nprobe": 16,
                    "radius": 1.0
                }
            }
            
            start = time.time()
            results_range = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_range,
                limit=10000
            )
            time_range = time.time() - start
            
            print(f"\n范围查询 (radius=1.0):")
            print(f"  查询时间: {time_range*1000:.2f}ms")
            print(f"  结果数: {len(results_range[0])}")
            print(f"  性能比: {time_range/time_topk:.2f}x")
            
            # 优化策略1: 使用标量过滤
            start = time.time()
            results_filtered = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params_range,
                limit=10000,
                expr='id % 10 == 0'  # 过滤90%数据
            )
            time_filtered = time.time() - start
            
            print(f"\n范围查询 + 标量过滤:")
            print(f"  查询时间: {time_filtered*1000:.2f}ms")
            print(f"  结果数: {len(results_filtered[0])}")
            print(f"  加速比: {time_range/time_filtered:.2f}x")
            
            # 优化策略2: 调整radius
            radius_values = [0.3, 0.5, 0.8, 1.0, 1.5]
            
            print("\n不同radius的性能:")
            print(f"{'radius':>8s} {'查询时间':>12s} {'结果数':>8s}")
            print("-" * 32)
            
            for radius in radius_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": radius
                    }
                }
                
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10000
                )
                elapsed = time.time() - start
                
                print(f"{radius:8.1f} {elapsed*1000:10.2f}ms {len(results[0]):8d}")
            
            # 优化策略3: 分批处理
            def batch_range_search(query_vector, radius, batch_size=1000):
                """分批处理范围查询结果"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": radius
                    }
                }
                
                offset = 0
                all_results = []
                
                while True:
                    results = collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=batch_size,
                        offset=offset
                    )
                    
                    if len(results[0]) == 0:
                        break
                    
                    all_results.extend(results[0])
                    offset += batch_size
                    
                    if offset >= 10000:  # 最大限制
                        break
                
                return all_results
            
            print("\n分批处理范围查询:")
            query_vec = [np.random.random() for _ in range(128)]
            batch_results = batch_range_search(query_vec, radius=0.8, batch_size=500)
            print(f"  总结果数: {len(batch_results)}")
            ---
    b.使用建议
        a.功能说明
            范围查询适合需要获取所有相似结果的场景。不适合对性能要求极高的实时查询。建议先用小数据集测试radius值。监控返回结果数量,避免过载。考虑使用Top-K查询替代范围查询。范围查询结合标量过滤效果更好。需要在召回率和性能间权衡。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 场景1: 查找所有相似文档
            def find_all_similar_docs(query_vector, similarity_threshold=0.7):
                """查找所有相似文档(适合离线分析)"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 32,  # 更高的nprobe提升召回
                        "radius": similarity_threshold
                    }
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=5000,
                    output_fields=["id", "title"]
                )
                
                print(f"找到 {len(results[0])} 个相似文档")
                return results[0]
            
            # 场景2: 去重检测
            def detect_duplicates(query_vector, duplicate_threshold=0.1):
                """检测重复文档(距离很小)"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": duplicate_threshold
                    }
                }
                
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=100
                )
                
                duplicates = [hit for hit in results[0] if hit.distance < duplicate_threshold]
                
                print(f"检测到 {len(duplicates)} 个可能的重复")
                return duplicates
            
            # 场景3: 聚类分析
            def cluster_analysis(center_vector, cluster_radius=0.5):
                """基于中心点的聚类分析"""
                search_params = {
                    "metric_type": "L2",
                    "params": {
                        "nprobe": 16,
                        "radius": cluster_radius
                    }
                }
                
                results = collection.search(
                    data=[center_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10000
                )
                
                cluster_members = results[0]
                
                # 统计聚类信息
                distances = [hit.distance for hit in cluster_members]
                avg_distance = sum(distances) / len(distances) if distances else 0
                
                print(f"聚类成员数: {len(cluster_members)}")
                print(f"平均距离: {avg_distance:.4f}")
                
                return cluster_members
            
            # 决策树: Top-K vs 范围查询
            def choose_search_method(scenario):
                """根据场景选择搜索方法"""
                recommendations = {
                    "实时推荐": "Top-K (limit=10-20)",
                    "搜索结果": "Top-K (limit=20-50)",
                    "相似内容": "范围查询 (radius=0.5-0.8)",
                    "去重检测": "范围查询 (radius=0.1-0.3)",
                    "聚类分析": "范围查询 (radius=0.5-1.0)",
                    "批量处理": "范围查询 + 分批"
                }
                
                return recommendations.get(scenario, "Top-K (默认)")
            
            print("\n搜索方法选择建议:")
            scenarios = ["实时推荐", "搜索结果", "相似内容", "去重检测", "聚类分析", "批量处理"]
            
            for scenario in scenarios:
                method = choose_search_method(scenario)
                print(f"  {scenario:12s}: {method}")
            
            # 使用示例
            query_vector = [np.random.random() for _ in range(128)]
            
            print("\n实际应用示例:")
            similar_docs = find_all_similar_docs(query_vector, similarity_threshold=0.7)
            duplicates = detect_duplicates(query_vector, duplicate_threshold=0.1)
            cluster = cluster_analysis(query_vector, cluster_radius=0.5)
            ---

6.3 混合检索

01.向量+标量混合
    a.基本混合查询
        a.功能说明
            混合检索结合向量相似度搜索和标量字段过滤,实现更精确的查询。通过expr参数指定标量过滤条件,先过滤再进行向量搜索。可以显著减少向量计算量,提升查询性能。支持等值、范围、逻辑运算等多种过滤条件。标量过滤在向量搜索前执行,是性能优化的关键。适合需要同时满足语义相似和业务条件的场景。需要为过滤字段创建索引以获得最佳性能。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 纯向量搜索(基准)
            start = time.time()
            results_vector_only = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            time_vector_only = time.time() - start
            
            print(f"纯向量搜索:")
            print(f"  查询时间: {time_vector_only*1000:.2f}ms")
            print(f"  结果数: {len(results_vector_only[0])}")
            
            # 混合查询: 向量 + 类别过滤
            start = time.time()
            results_hybrid = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='category == "电子产品"',
                output_fields=["id", "title", "category", "price"]
            )
            time_hybrid = time.time() - start
            
            print(f"\n混合查询(向量 + 类别):")
            print(f"  查询时间: {time_hybrid*1000:.2f}ms")
            print(f"  结果数: {len(results_hybrid[0])}")
            
            for hit in results_hybrid[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.entity.get('category')}, ¥{hit.entity.get('price'):.2f}")
            
            # 混合查询: 向量 + 价格范围
            results_price = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='price >= 100 and price <= 500',
                output_fields=["id", "title", "price"]
            )
            
            print(f"\n混合查询(向量 + 价格范围):")
            print(f"  结果数: {len(results_price[0])}")
            for hit in results_price[0][:5]:
                print(f"  {hit.entity.get('title')}: ¥{hit.entity.get('price'):.2f}, 距离: {hit.distance:.4f}")
            
            # 混合查询: 向量 + 多条件
            complex_expr = '''
                category == "电子产品" and
                price >= 100 and price <= 1000 and
                rating >= 4.0 and
                stock > 0
            '''
            
            results_complex = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=complex_expr,
                output_fields=["id", "title", "category", "price", "rating", "stock"]
            )
            
            print(f"\n混合查询(向量 + 多条件):")
            print(f"  结果数: {len(results_complex[0])}")
            
            # 性能对比
            print(f"\n性能对比:")
            print(f"  纯向量: {time_vector_only*1000:.2f}ms")
            print(f"  混合查询: {time_hybrid*1000:.2f}ms")
            print(f"  性能比: {time_hybrid/time_vector_only:.2f}x")
            print(f"  说明: 混合查询通过标量过滤减少向量计算,可能更快")
            ---
    b.过滤策略
        a.功能说明
            过滤策略影响混合查询的性能和结果。高选择性过滤(过滤掉大部分数据)可以显著提升性能。低选择性过滤效果不明显,反而增加开销。应该将高选择性条件放在前面。复杂表达式可能无法充分利用索引。建议使用简单的AND组合条件。过滤后的候选集应该足够大,避免无结果。需要在过滤严格度和结果数量间平衡。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 测试不同选择性的过滤条件
            filters = [
                ('id >= 0', "无过滤(选择性0%)"),
                ('category == "电子产品"', "低选择性(约25%)"),
                ('price > 500', "中选择性(约50%)"),
                ('category == "电子产品" and price > 500', "高选择性(约10%)"),
                ('category == "电子产品" and price > 800 and rating >= 4.5', "极高选择性(约2%)")
            ]
            
            print("不同选择性过滤条件的性能:\n")
            print(f"{'过滤条件':>50s} {'查询时间':>12s} {'结果数':>8s}")
            print("-" * 75)
            
            for expr, desc in filters:
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10,
                    expr=expr,
                    output_fields=["id"]
                )
                elapsed = time.time() - start
                
                print(f"{desc:>50s} {elapsed*1000:10.2f}ms {len(results[0]):8d}")
            
            # 过滤顺序优化
            print("\n过滤顺序优化:")
            
            # 策略1: 低选择性在前
            expr1 = 'category == "电子产品" and price > 800'
            
            start = time.time()
            results1 = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr1
            )
            time1 = time.time() - start
            
            print(f"  低选择性在前: {time1*1000:.2f}ms")
            
            # 策略2: 高选择性在前
            expr2 = 'price > 800 and category == "电子产品"'
            
            start = time.time()
            results2 = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr2
            )
            time2 = time.time() - start
            
            print(f"  高选择性在前: {time2*1000:.2f}ms")
            print(f"  说明: Milvus会自动优化,顺序影响不大")
            
            # 过滤策略决策树
            def recommend_filter_strategy(data_size, filter_selectivity):
                """推荐过滤策略"""
                if filter_selectivity < 0.1:
                    return "极高选择性,优先使用标量查询"
                elif filter_selectivity < 0.3:
                    return "高选择性,混合查询效果好"
                elif filter_selectivity < 0.7:
                    return "中等选择性,混合查询有一定效果"
                else:
                    return "低选择性,考虑纯向量搜索"
            
            print("\n过滤策略建议:")
            selectivities = [0.05, 0.2, 0.5, 0.8]
            
            for sel in selectivities:
                strategy = recommend_filter_strategy(1000000, sel)
                print(f"  选择性 {sel*100:4.1f}%: {strategy}")
            ---

02.多向量混合
    a.多字段搜索
        a.功能说明
            多向量混合搜索支持在一个Collection中搜索多个向量字段。每个向量字段可以使用不同的索引和搜索参数。适合多模态搜索场景,如图文混合搜索。可以为不同向量字段设置不同的权重。需要合并多个向量字段的搜索结果。Milvus支持在单次查询中搜索多个向量字段。结果合并策略影响最终排序。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建多向量字段Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="text_embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=512)
            ]
            schema = CollectionSchema(fields=fields, description="多模态搜索")
            collection = Collection("multimodal_search", schema=schema)
            
            # 插入数据
            data_size = 10000
            ids = list(range(data_size))
            titles = [f"文档{i}" for i in range(data_size)]
            text_embeddings = [[np.random.random() for _ in range(768)] for _ in range(data_size)]
            image_embeddings = [[np.random.random() for _ in range(512)] for _ in range(data_size)]
            
            data = [ids, titles, text_embeddings, image_embeddings]
            collection.insert(data)
            collection.flush()
            
            # 为每个向量字段创建索引
            text_index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "COSINE",
                "params": {"nlist": 128}
            }
            
            image_index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            
            collection.create_index(field_name="text_embedding", index_params=text_index_params)
            collection.create_index(field_name="image_embedding", index_params=image_index_params)
            
            collection.load()
            
            # 文本向量搜索
            text_query = [[np.random.random() for _ in range(768)]]
            
            text_results = collection.search(
                data=text_query,
                anns_field="text_embedding",
                param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                limit=10,
                output_fields=["id", "title"]
            )
            
            print("文本向量搜索结果:")
            for hit in text_results[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 图像向量搜索
            image_query = [[np.random.random() for _ in range(512)]]
            
            image_results = collection.search(
                data=image_query,
                anns_field="image_embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                output_fields=["id", "title"]
            )
            
            print("\n图像向量搜索结果:")
            for hit in image_results[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 多向量融合搜索
            def multimodal_search(text_vector, image_vector, text_weight=0.6, image_weight=0.4):
                """多模态融合搜索"""
                # 分别搜索
                text_results = collection.search(
                    data=[text_vector],
                    anns_field="text_embedding",
                    param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                    limit=50,
                    output_fields=["id", "title"]
                )
                
                image_results = collection.search(
                    data=[image_vector],
                    anns_field="image_embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=50,
                    output_fields=["id", "title"]
                )
                
                # 归一化距离到[0, 1]
                text_scores = {}
                for hit in text_results[0]:
                    # COSINE距离转相似度
                    text_scores[hit.id] = hit.distance
                
                image_scores = {}
                max_image_dist = max([hit.distance for hit in image_results[0]]) if image_results[0] else 1.0
                for hit in image_results[0]:
                    # L2距离归一化
                    image_scores[hit.id] = 1 - (hit.distance / max_image_dist)
                
                # 融合分数
                all_ids = set(text_scores.keys()) | set(image_scores.keys())
                fused_scores = {}
                
                for doc_id in all_ids:
                    text_score = text_scores.get(doc_id, 0)
                    image_score = image_scores.get(doc_id, 0)
                    fused_scores[doc_id] = text_weight * text_score + image_weight * image_score
                
                # 排序
                sorted_results = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
                
                return sorted_results[:10]
            
            # 执行多模态搜索
            text_vec = [np.random.random() for _ in range(768)]
            image_vec = [np.random.random() for _ in range(512)]
            
            fused_results = multimodal_search(text_vec, image_vec, text_weight=0.6, image_weight=0.4)
            
            print("\n多模态融合搜索结果:")
            for doc_id, score in fused_results:
                print(f"  文档ID: {doc_id}, 融合分数: {score:.4f}")
            ---
    b.结果融合
        a.功能说明
            多向量搜索需要合并不同向量字段的结果。常见融合策略包括加权平均、RRF(Reciprocal Rank Fusion)、最大值等。权重设置影响不同模态的重要性。需要归一化不同距离度量的分数。融合算法应该考虑结果的排序位置。可以根据业务场景调整融合策略。需要实验确定最优权重配置。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("multimodal_search")
            collection.load()
            
            # 融合策略1: 加权平均
            def weighted_average_fusion(results_list, weights):
                """加权平均融合"""
                all_scores = {}
                
                for results, weight in zip(results_list, weights):
                    for hit in results[0]:
                        if hit.id not in all_scores:
                            all_scores[hit.id] = 0
                        all_scores[hit.id] += weight * hit.distance
                
                sorted_results = sorted(all_scores.items(), key=lambda x: x[1], reverse=True)
                return sorted_results[:10]
            
            # 融合策略2: RRF (Reciprocal Rank Fusion)
            def rrf_fusion(results_list, k=60):
                """RRF融合,对排序位置不敏感"""
                rrf_scores = {}
                
                for results in results_list:
                    for rank, hit in enumerate(results[0]):
                        if hit.id not in rrf_scores:
                            rrf_scores[hit.id] = 0
                        rrf_scores[hit.id] += 1 / (k + rank + 1)
                
                sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
                return sorted_results[:10]
            
            # 融合策略3: 最大值融合
            def max_fusion(results_list):
                """取每个文档的最大分数"""
                max_scores = {}
                
                for results in results_list:
                    for hit in results[0]:
                        if hit.id not in max_scores:
                            max_scores[hit.id] = hit.distance
                        else:
                            max_scores[hit.id] = max(max_scores[hit.id], hit.distance)
                
                sorted_results = sorted(max_scores.items(), key=lambda x: x[1], reverse=True)
                return sorted_results[:10]
            
            # 测试不同融合策略
            text_query = [[np.random.random() for _ in range(768)]]
            image_query = [[np.random.random() for _ in range(512)]]
            
            text_results = collection.search(
                data=text_query,
                anns_field="text_embedding",
                param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                limit=50
            )
            
            image_results = collection.search(
                data=image_query,
                anns_field="image_embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=50
            )
            
            results_list = [text_results, image_results]
            
            print("不同融合策略对比:\n")
            
            # 加权平均
            wa_results = weighted_average_fusion(results_list, weights=[0.6, 0.4])
            print("加权平均融合 (0.6:0.4):")
            for doc_id, score in wa_results[:5]:
                print(f"  文档ID: {doc_id}, 分数: {score:.4f}")
            
            # RRF
            rrf_results = rrf_fusion(results_list, k=60)
            print("\nRRF融合:")
            for doc_id, score in rrf_results[:5]:
                print(f"  文档ID: {doc_id}, RRF分数: {score:.4f}")
            
            # 最大值
            max_results = max_fusion(results_list)
            print("\n最大值融合:")
            for doc_id, score in max_results[:5]:
                print(f"  文档ID: {doc_id}, 最大分数: {score:.4f}")
            
            # 自适应权重
            class AdaptiveFusion:
                def __init__(self):
                    self.history = []
                
                def fuse(self, results_list, initial_weights=[0.5, 0.5]):
                    """自适应权重融合"""
                    # 计算每个模态的结果质量
                    qualities = []
                    for results in results_list:
                        if len(results[0]) > 0:
                            # 使用距离分布评估质量
                            distances = [hit.distance for hit in results[0]]
                            quality = 1 / (np.std(distances) + 0.01)  # 距离分布越集中质量越高
                        else:
                            quality = 0
                        qualities.append(quality)
                    
                    # 归一化权重
                    total_quality = sum(qualities)
                    if total_quality > 0:
                        adaptive_weights = [q / total_quality for q in qualities]
                    else:
                        adaptive_weights = initial_weights
                    
                    print(f"自适应权重: {adaptive_weights}")
                    
                    # 加权融合
                    return weighted_average_fusion(results_list, adaptive_weights)
            
            adaptive_fusion = AdaptiveFusion()
            adaptive_results = adaptive_fusion.fuse(results_list)
            
            print("\n自适应权重融合:")
            for doc_id, score in adaptive_results[:5]:
                print(f"  文档ID: {doc_id}, 分数: {score:.4f}")
            ---

6.4 标量过滤

01.过滤表达式
    a.表达式语法
        a.功能说明
            Milvus支持丰富的过滤表达式语法,包括比较运算符(==, !=, >, >=, <, <=)、逻辑运算符(and, or, not)、成员运算符(in, not in)等。表达式支持整数、浮点数、字符串、布尔类型字段。可以使用括号改变优先级。字符串比较区分大小写。支持算术表达式和函数调用。表达式会被解析和优化,尽量利用索引。复杂表达式可能影响性能。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 比较运算符
            expressions = [
                ('price == 99.99', "等于"),
                ('price != 99.99', "不等于"),
                ('price > 100', "大于"),
                ('price >= 100', "大于等于"),
                ('price < 500', "小于"),
                ('price <= 500', "小于等于")
            ]
            
            print("比较运算符示例:\n")
            
            for expr, desc in expressions:
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5,
                    expr=expr,
                    output_fields=["id", "title", "price"]
                )
                
                print(f"{desc:10s} ({expr:20s}): {len(results[0])} 条结果")
            
            # 逻辑运算符
            logical_expressions = [
                ('price > 100 and price < 500', "AND运算"),
                ('category == "电子" or category == "图书"', "OR运算"),
                ('not (price > 1000)', "NOT运算"),
                ('(price > 100 and price < 500) or category == "特价"', "组合运算")
            ]
            
            print("\n逻辑运算符示例:\n")
            
            for expr, desc in logical_expressions:
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5,
                    expr=expr
                )
                
                print(f"{desc:10s}: {len(results[0])} 条结果")
            
            # 成员运算符
            member_expressions = [
                ('category in ["电子", "图书", "服装"]', "IN运算"),
                ('category not in ["食品", "玩具"]', "NOT IN运算"),
                ('id in [1, 2, 3, 4, 5]', "ID列表")
            ]
            
            print("\n成员运算符示例:\n")
            
            for expr, desc in member_expressions:
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=5,
                    expr=expr,
                    output_fields=["id", "category"]
                )
                
                print(f"{desc:15s}: {len(results[0])} 条结果")
            
            # 字符串匹配
            string_expressions = [
                ('title like "手机%"', "前缀匹配"),
                ('title like "%Pro"', "后缀匹配"),
                ('title like "%iPhone%"', "包含匹配")
            ]
            
            print("\n字符串匹配示例:\n")
            
            for expr, desc in string_expressions:
                try:
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=5,
                        expr=expr,
                        output_fields=["id", "title"]
                    )
                    
                    print(f"{desc:10s}: {len(results[0])} 条结果")
                except Exception as e:
                    print(f"{desc:10s}: 不支持或错误 - {str(e)}")
            
            # 复杂表达式
            complex_expr = '''
                (category == "电子" and price >= 1000 and price <= 5000) or
                (category == "图书" and price >= 50 and rating >= 4.5) or
                (category == "服装" and discount > 0.5)
            '''
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=complex_expr,
                output_fields=["id", "title", "category", "price"]
            )
            
            print(f"\n复杂表达式: {len(results[0])} 条结果")
            ---
    b.表达式优化
        a.功能说明
            表达式优化可以显著提升查询性能。应该使用索引字段进行过滤。将高选择性条件放在前面。避免使用NOT运算符,改用正向条件。使用IN代替多个OR条件。避免在表达式中使用函数调用。简化复杂嵌套表达式。测试表达式的执行计划。监控过滤性能,及时优化。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 优化前: 使用多个OR
            expr_before = '''
                category == "电子" or 
                category == "图书" or 
                category == "服装" or 
                category == "食品"
            '''
            
            start = time.time()
            results_before = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_before
            )
            time_before = time.time() - start
            
            print("优化前(多个OR):")
            print(f"  查询时间: {time_before*1000:.2f}ms")
            print(f"  结果数: {len(results_before[0])}")
            
            # 优化后: 使用IN
            expr_after = 'category in ["电子", "图书", "服装", "食品"]'
            
            start = time.time()
            results_after = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_after
            )
            time_after = time.time() - start
            
            print(f"\n优化后(IN运算):")
            print(f"  查询时间: {time_after*1000:.2f}ms")
            print(f"  结果数: {len(results_after[0])}")
            print(f"  加速比: {time_before/time_after:.2f}x")
            
            # 优化: 避免NOT
            expr_not = 'not (price > 1000)'
            expr_positive = 'price <= 1000'
            
            start = time.time()
            results_not = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_not
            )
            time_not = time.time() - start
            
            start = time.time()
            results_positive = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr=expr_positive
            )
            time_positive = time.time() - start
            
            print(f"\nNOT运算对比:")
            print(f"  NOT运算: {time_not*1000:.2f}ms")
            print(f"  正向条件: {time_positive*1000:.2f}ms")
            print(f"  加速比: {time_not/time_positive:.2f}x")
            
            # 表达式简化
            class ExpressionOptimizer:
                @staticmethod
                def optimize(expr):
                    """简化表达式"""
                    optimizations = []
                    
                    # 检查多个OR
                    if expr.count(' or ') >= 3:
                        optimizations.append("建议: 使用IN代替多个OR")
                    
                    # 检查NOT
                    if 'not ' in expr.lower():
                        optimizations.append("建议: 避免NOT,使用正向条件")
                    
                    # 检查复杂嵌套
                    if expr.count('(') > 3:
                        optimizations.append("建议: 简化嵌套表达式")
                    
                    # 检查函数调用
                    if '(' in expr and ')' in expr:
                        optimizations.append("警告: 可能包含函数调用,影响性能")
                    
                    return optimizations
                
                @staticmethod
                def analyze(expr, collection):
                    """分析表达式性能"""
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        expr=expr
                    )
                    elapsed = time.time() - start
                    
                    return {
                        "query_time": elapsed * 1000,
                        "result_count": len(results[0])
                    }
            
            optimizer = ExpressionOptimizer()
            
            # 分析复杂表达式
            complex_expr = '''
                not (category == "电子" or category == "图书") and
                (price > 100 or discount > 0.5) and
                rating >= 4.0
            '''
            
            print(f"\n表达式优化建议:")
            suggestions = optimizer.optimize(complex_expr)
            for suggestion in suggestions:
                print(f"  {suggestion}")
            
            metrics = optimizer.analyze(complex_expr, collection)
            print(f"\n性能分析:")
            print(f"  查询时间: {metrics['query_time']:.2f}ms")
            print(f"  结果数: {metrics['result_count']}")
            ---

02.过滤性能
    a.索引利用
        a.功能说明
            过滤性能高度依赖索引。为常用过滤字段创建索引可以显著提升性能。索引类型影响过滤效率,选择合适的索引类型。组合条件可能无法完全利用索引。过滤在向量搜索前执行,减少向量计算量。监控索引使用情况,优化索引配置。定期分析慢查询,优化过滤条件。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("products")
            
            # 测试有索引 vs 无索引
            print("索引对过滤性能的影响:\n")
            
            # 场景1: 无索引
            collection.release()
            if collection.has_index("category"):
                collection.drop_index("category")
            
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            results_no_index = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='category == "电子"'
            )
            time_no_index = time.time() - start
            
            print(f"无索引:")
            print(f"  查询时间: {time_no_index*1000:.2f}ms")
            
            # 场景2: 有索引
            collection.release()
            collection.create_index(
                field_name="category",
                index_name="category_index"
            )
            collection.load()
            
            start = time.time()
            results_with_index = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                expr='category == "电子"'
            )
            time_with_index = time.time() - start
            
            print(f"\n有索引:")
            print(f"  查询时间: {time_with_index*1000:.2f}ms")
            print(f"  加速比: {time_no_index/time_with_index:.2f}x")
            
            # 组合条件的索引利用
            print("\n组合条件索引利用:")
            
            # 为price字段创建索引
            collection.release()
            collection.create_index(
                field_name="price",
                index_name="price_index"
            )
            collection.load()
            
            # 测试不同组合
            test_cases = [
                ('category == "电子"', "单字段(有索引)"),
                ('price > 100', "单字段(有索引)"),
                ('category == "电子" and price > 100', "两字段AND(都有索引)"),
                ('category == "电子" or price > 100', "两字段OR(都有索引)"),
                ('category == "电子" and rating > 4.0', "混合(一个有索引)")
            ]
            
            print(f"\n{'表达式':>45s} {'查询时间':>12s}")
            print("-" * 60)
            
            for expr, desc in test_cases:
                start = time.time()
                try:
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        expr=expr
                    )
                    elapsed = time.time() - start
                    print(f"{desc:>45s} {elapsed*1000:10.2f}ms")
                except Exception as e:
                    print(f"{desc:>45s} 错误: {str(e)}")
            
            # 索引选择建议
            print("\n索引选择建议:")
            print("  1. 为高频查询字段创建索引")
            print("  2. 高基数字段(唯一值多)索引效果好")
            print("  3. 低基数字段(如性别)索引效果有限")
            print("  4. 组合查询考虑创建多个单字段索引")
            print("  5. 监控索引使用率,删除无用索引")
            ---
    b.性能监控
        a.功能说明
            监控过滤性能有助于发现瓶颈和优化机会。关注查询延迟、过滤选择性、索引命中率等指标。分析慢查询,识别性能问题。定期审查过滤表达式,优化复杂查询。使用性能分析工具定位瓶颈。建立性能基线,持续监控。设置告警阈值,及时发现异常。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from collections import defaultdict
            
            collection = Collection("products")
            collection.load()
            
            # 性能监控类
            class FilterPerformanceMonitor:
                def __init__(self):
                    self.query_log = []
                    self.stats = defaultdict(list)
                
                def log_query(self, expr, query_time, result_count):
                    """记录查询"""
                    self.query_log.append({
                        "expr": expr,
                        "time": query_time,
                        "count": result_count,
                        "timestamp": time.time()
                    })
                    
                    self.stats[expr].append(query_time)
                
                def get_slow_queries(self, threshold_ms=100):
                    """获取慢查询"""
                    slow_queries = [
                        q for q in self.query_log 
                        if q["time"] > threshold_ms
                    ]
                    return slow_queries
                
                def get_stats(self):
                    """获取统计信息"""
                    stats_summary = {}
                    
                    for expr, times in self.stats.items():
                        stats_summary[expr] = {
                            "count": len(times),
                            "avg_time": np.mean(times),
                            "p95_time": np.percentile(times, 95),
                            "max_time": max(times)
                        }
                    
                    return stats_summary
                
                def recommend_optimizations(self):
                    """推荐优化建议"""
                    recommendations = []
                    
                    slow_queries = self.get_slow_queries(threshold_ms=50)
                    if slow_queries:
                        recommendations.append(
                            f"发现 {len(slow_queries)} 个慢查询(>50ms),建议优化"
                        )
                    
                    stats = self.get_stats()
                    for expr, stat in stats.items():
                        if stat["avg_time"] > 30:
                            recommendations.append(
                                f"表达式 '{expr[:50]}...' 平均耗时 {stat['avg_time']:.2f}ms,建议优化"
                            )
                    
                    return recommendations
            
            # 使用监控器
            monitor = FilterPerformanceMonitor()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 模拟多次查询
            test_expressions = [
                'category == "电子"',
                'price > 100 and price < 500',
                'category in ["电子", "图书", "服装"]',
                'rating >= 4.0 and stock > 0'
            ]
            
            print("执行测试查询...\n")
            
            for _ in range(10):
                for expr in test_expressions:
                    start = time.time()
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        expr=expr
                    )
                    elapsed = (time.time() - start) * 1000
                    
                    monitor.log_query(expr, elapsed, len(results[0]))
            
            # 分析结果
            print("性能统计:\n")
            print(f"{'表达式':>50s} {'查询次数':>10s} {'平均时间':>12s} {'P95时间':>12s}")
            print("-" * 90)
            
            stats = monitor.get_stats()
            for expr, stat in stats.items():
                print(f"{expr:>50s} {stat['count']:>10d} {stat['avg_time']:>10.2f}ms {stat['p95_time']:>10.2f}ms")
            
            # 慢查询分析
            slow_queries = monitor.get_slow_queries(threshold_ms=20)
            if slow_queries:
                print(f"\n慢查询 (>20ms): {len(slow_queries)} 个")
                for q in slow_queries[:5]:
                    print(f"  {q['expr'][:50]}: {q['time']:.2f}ms")
            
            # 优化建议
            print("\n优化建议:")
            recommendations = monitor.recommend_optimizations()
            for rec in recommendations:
                print(f"  - {rec}")
            
            # 性能报告
            print("\n性能报告:")
            print(f"  总查询数: {len(monitor.query_log)}")
            print(f"  平均延迟: {np.mean([q['time'] for q in monitor.query_log]):.2f}ms")
            print(f"  P95延迟: {np.percentile([q['time'] for q in monitor.query_log], 95):.2f}ms")
            print(f"  P99延迟: {np.percentile([q['time'] for q in monitor.query_log], 99):.2f}ms")
            ---

6.5 批量查询

01.批量搜索
    a.批量提交
        a.功能说明
            批量搜索允许一次提交多个查询向量,提升吞吐量。Milvus会并行处理批量查询,共享索引访问开销。批量大小影响性能,推荐10-100个查询一批。过大的批量可能导致内存压力和延迟增加。批量查询返回列表,每个元素对应一个查询的结果。适合离线批处理场景,如批量推荐、批量相似度计算等。可以显著降低网络往返开销。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单次查询性能
            print("单次查询 vs 批量查询性能对比:\n")
            
            num_queries = 100
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
            
            # 方式1: 逐个查询
            start = time.time()
            results_sequential = []
            for query_vector in query_vectors:
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                results_sequential.append(results[0])
            time_sequential = time.time() - start
            
            print(f"逐个查询 ({num_queries}次):")
            print(f"  总时间: {time_sequential:.2f}s")
            print(f"  平均每次: {time_sequential/num_queries*1000:.2f}ms")
            print(f"  QPS: {num_queries/time_sequential:.2f}")
            
            # 方式2: 批量查询
            start = time.time()
            results_batch = collection.search(
                data=query_vectors,
                anns_field="embedding",
                param=search_params,
                limit=10
            )
            time_batch = time.time() - start
            
            print(f"\n批量查询 ({num_queries}次):")
            print(f"  总时间: {time_batch:.2f}s")
            print(f"  平均每次: {time_batch/num_queries*1000:.2f}ms")
            print(f"  QPS: {num_queries/time_batch:.2f}")
            print(f"  加速比: {time_sequential/time_batch:.2f}x")
            
            # 不同批量大小的性能
            batch_sizes = [1, 10, 50, 100, 200]
            
            print("\n不同批量大小的性能:\n")
            print(f"{'批量大小':>10s} {'总时间':>10s} {'平均每次':>12s} {'QPS':>10s}")
            print("-" * 48)
            
            for batch_size in batch_sizes:
                test_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                
                start = time.time()
                results = collection.search(
                    data=test_vectors,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                elapsed = time.time() - start
                
                avg_time = elapsed / batch_size * 1000
                qps = batch_size / elapsed
                
                print(f"{batch_size:10d} {elapsed:9.3f}s {avg_time:10.2f}ms {qps:9.2f}")
            
            # 批量查询最佳实践
            print("\n批量查询最佳实践:")
            print("  1. 批量大小: 10-100(根据延迟要求)")
            print("  2. 离线处理: 使用更大批量(100-500)")
            print("  3. 实时场景: 使用小批量(10-50)")
            print("  4. 监控内存: 避免批量过大导致OOM")
            print("  5. 并发控制: 限制同时批量查询数")
            ---
    b.并发查询
        a.功能说明
            并发查询通过多线程或多进程提升吞吐量。Milvus支持多客户端并发查询,充分利用服务器资源。并发数应该根据服务器CPU核心数调整。过高并发可能导致资源竞争和性能下降。需要在延迟和吞吐量间权衡。适合高吞吐场景,如批量推荐系统。建议使用连接池管理并发连接。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            import numpy as np
            import time
            import concurrent.futures
            from threading import Lock
            
            # 连接Milvus
            connections.connect(host="localhost", port="19530")
            
            collection = Collection("documents")
            collection.load()
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单线程查询
            def single_thread_queries(num_queries=100):
                """单线程查询"""
                query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                
                start = time.time()
                for query_vector in query_vectors:
                    collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                elapsed = time.time() - start
                
                return elapsed, num_queries
            
            # 多线程查询
            def multi_thread_queries(num_queries=100, num_workers=4):
                """多线程查询"""
                query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                
                def query_worker(query_vector):
                    """单个查询任务"""
                    return collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                
                start = time.time()
                with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
                    futures = [executor.submit(query_worker, qv) for qv in query_vectors]
                    results = [future.result() for future in concurrent.futures.as_completed(futures)]
                elapsed = time.time() - start
                
                return elapsed, num_queries
            
            # 性能对比
            print("并发查询性能测试:\n")
            
            num_queries = 100
            
            # 单线程
            time_single, count_single = single_thread_queries(num_queries)
            qps_single = count_single / time_single
            
            print(f"单线程:")
            print(f"  总时间: {time_single:.2f}s")
            print(f"  QPS: {qps_single:.2f}")
            
            # 不同并发数
            worker_counts = [2, 4, 8, 16]
            
            print(f"\n不同并发数性能:\n")
            print(f"{'并发数':>8s} {'总时间':>10s} {'QPS':>10s} {'加速比':>10s}")
            print("-" * 42)
            
            for num_workers in worker_counts:
                time_multi, count_multi = multi_thread_queries(num_queries, num_workers)
                qps_multi = count_multi / time_multi
                speedup = time_single / time_multi
                
                print(f"{num_workers:8d} {time_multi:9.2f}s {qps_multi:9.2f} {speedup:9.2f}x")
            
            # 并发控制器
            class ConcurrentQueryController:
                def __init__(self, collection, max_workers=8):
                    self.collection = collection
                    self.max_workers = max_workers
                    self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
                    self.lock = Lock()
                    self.query_count = 0
                
                def query(self, query_vector, search_params, limit=10):
                    """提交查询任务"""
                    def _query():
                        with self.lock:
                            self.query_count += 1
                        
                        return self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=limit
                        )
                    
                    return self.executor.submit(_query)
                
                def batch_query(self, query_vectors, search_params, limit=10):
                    """批量提交查询"""
                    futures = [self.query(qv, search_params, limit) for qv in query_vectors]
                    return futures
                
                def wait_all(self, futures):
                    """等待所有查询完成"""
                    results = []
                    for future in concurrent.futures.as_completed(futures):
                        results.append(future.result())
                    return results
                
                def get_stats(self):
                    """获取统计信息"""
                    return {
                        "total_queries": self.query_count,
                        "max_workers": self.max_workers
                    }
                
                def shutdown(self):
                    """关闭执行器"""
                    self.executor.shutdown(wait=True)
            
            # 使用并发控制器
            controller = ConcurrentQueryController(collection, max_workers=8)
            
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(50)]
            
            print("\n使用并发控制器:")
            start = time.time()
            futures = controller.batch_query(query_vectors, search_params, limit=10)
            results = controller.wait_all(futures)
            elapsed = time.time() - start
            
            stats = controller.get_stats()
            print(f"  查询数: {stats['total_queries']}")
            print(f"  并发数: {stats['max_workers']}")
            print(f"  总时间: {elapsed:.2f}s")
            print(f"  QPS: {stats['total_queries']/elapsed:.2f}")
            
            controller.shutdown()
            
            # 并发优化建议
            print("\n并发优化建议:")
            print("  1. 并发数 = CPU核心数 × 2")
            print("  2. 使用连接池避免频繁建立连接")
            print("  3. 监控资源使用,避免过载")
            print("  4. 实时场景用低并发,批处理用高并发")
            print("  5. 结合批量查询和并发,最大化吞吐")
            ---

02.批量优化
    a.内存管理
        a.功能说明
            批量查询需要注意内存管理,避免OOM。查询向量和结果都占用内存,批量过大会导致内存溢出。应该根据可用内存限制批量大小。可以使用流式处理,分批加载和处理数据。监控内存使用,及时释放不需要的对象。使用生成器避免一次性加载所有数据。合理设置limit避免返回过多结果。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import psutil
            import gc
            
            collection = Collection("documents")
            collection.load()
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 内存监控
            def get_memory_usage():
                """获取当前内存使用"""
                process = psutil.Process()
                memory_info = process.memory_info()
                return memory_info.rss / 1024 / 1024  # MB
            
            # 批量查询内存分析
            print("批量查询内存使用分析:\n")
            
            batch_sizes = [10, 50, 100, 500, 1000]
            
            print(f"{'批量大小':>10s} {'查询前':>12s} {'查询后':>12s} {'增长':>12s}")
            print("-" * 50)
            
            for batch_size in batch_sizes:
                # 清理内存
                gc.collect()
                
                mem_before = get_memory_usage()
                
                # 生成查询向量
                query_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                
                # 执行查询
                results = collection.search(
                    data=query_vectors,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
                
                mem_after = get_memory_usage()
                mem_increase = mem_after - mem_before
                
                print(f"{batch_size:10d} {mem_before:10.2f}MB {mem_after:10.2f}MB {mem_increase:10.2f}MB")
                
                # 清理结果
                del query_vectors
                del results
                gc.collect()
            
            # 流式批量查询
            def streaming_batch_query(total_queries, batch_size=100):
                """流式批量查询,避免内存溢出"""
                num_batches = (total_queries + batch_size - 1) // batch_size
                
                for batch_idx in range(num_batches):
                    start_idx = batch_idx * batch_size
                    end_idx = min(start_idx + batch_size, total_queries)
                    current_batch_size = end_idx - start_idx
                    
                    # 生成当前批次的查询向量
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(current_batch_size)]
                    
                    # 执行查询
                    results = collection.search(
                        data=query_vectors,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    
                    # 处理结果(这里只是打印)
                    yield batch_idx, results
                    
                    # 清理内存
                    del query_vectors
                    del results
                    gc.collect()
            
            print("\n流式批量查询:")
            
            mem_start = get_memory_usage()
            print(f"开始内存: {mem_start:.2f}MB")
            
            total_queries = 1000
            batch_size = 100
            
            for batch_idx, results in streaming_batch_query(total_queries, batch_size):
                mem_current = get_memory_usage()
                print(f"  批次 {batch_idx+1}: {len(results)} 个结果, 内存: {mem_current:.2f}MB")
            
            mem_end = get_memory_usage()
            print(f"结束内存: {mem_end:.2f}MB")
            print(f"内存增长: {mem_end - mem_start:.2f}MB")
            
            # 自适应批量大小
            class AdaptiveBatchQuery:
                def __init__(self, collection, max_memory_mb=1000):
                    self.collection = collection
                    self.max_memory_mb = max_memory_mb
                    self.batch_size = 100
                
                def estimate_batch_size(self, vector_dim=128, limit=10):
                    """估算合适的批量大小"""
                    # 估算单个查询的内存占用
                    query_memory = vector_dim * 4 / 1024 / 1024  # MB
                    result_memory = limit * (vector_dim * 4 + 100) / 1024 / 1024  # MB
                    per_query_memory = query_memory + result_memory
                    
                    # 计算批量大小
                    available_memory = self.max_memory_mb * 0.8  # 留20%余量
                    estimated_batch_size = int(available_memory / per_query_memory)
                    
                    return max(10, min(estimated_batch_size, 1000))
                
                def query(self, query_vectors, search_params, limit=10):
                    """自适应批量查询"""
                    # 动态调整批量大小
                    optimal_batch_size = self.estimate_batch_size(limit=limit)
                    
                    print(f"自适应批量大小: {optimal_batch_size}")
                    
                    all_results = []
                    num_queries = len(query_vectors)
                    
                    for i in range(0, num_queries, optimal_batch_size):
                        batch = query_vectors[i:i+optimal_batch_size]
                        
                        results = self.collection.search(
                            data=batch,
                            anns_field="embedding",
                            param=search_params,
                            limit=limit
                        )
                        
                        all_results.extend(results)
                        
                        # 检查内存
                        current_memory = get_memory_usage()
                        if current_memory > self.max_memory_mb:
                            print(f"警告: 内存使用 {current_memory:.2f}MB 超过限制")
                            gc.collect()
                    
                    return all_results
            
            adaptive_query = AdaptiveBatchQuery(collection, max_memory_mb=500)
            
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(500)]
            results = adaptive_query.query(query_vectors, search_params, limit=10)
            
            print(f"\n自适应查询完成: {len(results)} 个结果")
            ---
    b.性能调优
        a.功能说明
            批量查询性能调优需要综合考虑多个因素。批量大小、并发数、搜索参数都影响性能。应该通过实验确定最优配置。监控QPS、延迟、内存等指标。使用性能分析工具定位瓶颈。考虑使用缓存减少重复查询。优化网络传输,使用压缩等技术。建立性能基线,持续优化。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 性能调优实验
            class BatchQueryTuner:
                def __init__(self, collection):
                    self.collection = collection
                    self.results = []
                
                def tune_batch_size(self, query_vectors, search_params):
                    """调优批量大小"""
                    batch_sizes = [10, 20, 50, 100, 200]
                    
                    print("批量大小调优:\n")
                    print(f"{'批量大小':>10s} {'总时间':>10s} {'QPS':>10s} {'平均延迟':>12s}")
                    print("-" * 48)
                    
                    best_qps = 0
                    best_batch_size = 10
                    
                    for batch_size in batch_sizes:
                        # 使用前N个查询
                        test_vectors = query_vectors[:min(batch_size * 10, len(query_vectors))]
                        
                        start = time.time()
                        for i in range(0, len(test_vectors), batch_size):
                            batch = test_vectors[i:i+batch_size]
                            self.collection.search(
                                data=batch,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                        elapsed = time.time() - start
                        
                        qps = len(test_vectors) / elapsed
                        avg_latency = elapsed / len(test_vectors) * 1000
                        
                        print(f"{batch_size:10d} {elapsed:9.2f}s {qps:9.2f} {avg_latency:10.2f}ms")
                        
                        if qps > best_qps:
                            best_qps = qps
                            best_batch_size = batch_size
                    
                    print(f"\n最优批量大小: {best_batch_size} (QPS: {best_qps:.2f})")
                    return best_batch_size
                
                def tune_search_params(self, query_vectors, batch_size):
                    """调优搜索参数"""
                    nprobe_values = [8, 16, 32, 64]
                    
                    print("\n搜索参数调优:\n")
                    print(f"{'nprobe':>8s} {'总时间':>10s} {'QPS':>10s}")
                    print("-" * 32)
                    
                    best_qps = 0
                    best_nprobe = 16
                    
                    for nprobe in nprobe_values:
                        search_params = {
                            "metric_type": "L2",
                            "params": {"nprobe": nprobe}
                        }
                        
                        test_vectors = query_vectors[:min(batch_size * 10, len(query_vectors))]
                        
                        start = time.time()
                        for i in range(0, len(test_vectors), batch_size):
                            batch = test_vectors[i:i+batch_size]
                            self.collection.search(
                                data=batch,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                        elapsed = time.time() - start
                        
                        qps = len(test_vectors) / elapsed
                        
                        print(f"{nprobe:8d} {elapsed:9.2f}s {qps:9.2f}")
                        
                        if qps > best_qps:
                            best_qps = qps
                            best_nprobe = nprobe
                    
                    print(f"\n最优nprobe: {best_nprobe} (QPS: {best_qps:.2f})")
                    return best_nprobe
                
                def full_tune(self, num_queries=1000):
                    """完整调优流程"""
                    print("=" * 60)
                    print("批量查询性能调优")
                    print("=" * 60 + "\n")
                    
                    # 生成测试查询
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                    
                    # 调优批量大小
                    optimal_batch_size = self.tune_batch_size(
                        query_vectors,
                        {"metric_type": "L2", "params": {"nprobe": 16}}
                    )
                    
                    # 调优搜索参数
                    optimal_nprobe = self.tune_search_params(query_vectors, optimal_batch_size)
                    
                    # 最终配置
                    print("\n" + "=" * 60)
                    print("最优配置")
                    print("=" * 60)
                    print(f"  批量大小: {optimal_batch_size}")
                    print(f"  nprobe: {optimal_nprobe}")
                    
                    # 验证性能
                    optimal_search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": optimal_nprobe}
                    }
                    
                    start = time.time()
                    for i in range(0, len(query_vectors), optimal_batch_size):
                        batch = query_vectors[i:i+optimal_batch_size]
                        self.collection.search(
                            data=batch,
                            anns_field="embedding",
                            param=optimal_search_params,
                            limit=10
                        )
                    elapsed = time.time() - start
                    
                    final_qps = len(query_vectors) / elapsed
                    final_latency = elapsed / len(query_vectors) * 1000
                    
                    print(f"\n最终性能:")
                    print(f"  QPS: {final_qps:.2f}")
                    print(f"  平均延迟: {final_latency:.2f}ms")
                    print(f"  总时间: {elapsed:.2f}s")
            
            # 执行调优
            tuner = BatchQueryTuner(collection)
            tuner.full_tune(num_queries=500)
            ---

7 高级特性

7.1 分区管理

01.分区概念
    a.分区作用
        a.功能说明
            分区是Collection内的逻辑分组,用于组织和管理数据。通过分区可以提升查询性能,只搜索相关分区而不是整个Collection。分区适合按时间、类别、地域等维度划分数据。每个Collection可以有多个分区,默认有一个_default分区。分区之间数据隔离,互不影响。可以独立加载、释放、删除分区。合理使用分区可以显著优化查询效率和资源使用。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建分区
            partition_2024 = collection.create_partition("year_2024")
            partition_2023 = collection.create_partition("year_2023")
            partition_2022 = collection.create_partition("year_2022")
            
            print("已创建分区:")
            for partition in collection.partitions:
                print(f"  - {partition.name}")
            
            # 向不同分区插入数据
            data_2024 = [
                [i for i in range(1000, 2000)],  # ids
                [f"文档2024_{i}" for i in range(1000)],  # titles
                [[np.random.random() for _ in range(128)] for _ in range(1000)]  # embeddings
            ]
            
            partition_2024.insert(data_2024)
            
            data_2023 = [
                [i for i in range(2000, 3000)],
                [f"文档2023_{i}" for i in range(1000)],
                [[np.random.random() for _ in range(128)] for _ in range(1000)]
            ]
            
            partition_2023.insert(data_2023)
            
            collection.flush()
            
            print(f"\n分区数据量:")
            print(f"  year_2024: {partition_2024.num_entities} 条")
            print(f"  year_2023: {partition_2023.num_entities} 条")
            print(f"  总计: {collection.num_entities} 条")
            
            # 分区搜索
            collection.load()
            
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 搜索特定分区
            results_2024 = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                partition_names=["year_2024"],
                output_fields=["id", "title"]
            )
            
            print(f"\n搜索year_2024分区:")
            for hit in results_2024[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 搜索多个分区
            results_multi = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                partition_names=["year_2024", "year_2023"],
                output_fields=["id", "title"]
            )
            
            print(f"\n搜索多个分区:")
            for hit in results_multi[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            
            # 搜索所有分区(不指定partition_names)
            results_all = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                output_fields=["id", "title"]
            )
            
            print(f"\n搜索所有分区:")
            for hit in results_all[0][:5]:
                print(f"  {hit.entity.get('title')}: {hit.distance:.4f}")
            ---
    b.分区策略
        a.功能说明
            分区策略影响系统性能和可维护性。常见策略包括按时间分区(日、月、年)、按类别分区(产品类型、文档类型)、按哈希分区(均匀分布)等。时间分区适合时序数据,便于数据老化和归档。类别分区适合多租户或多类型数据。哈希分区适合均匀分布负载。分区数量不宜过多,推荐10-100个。需要根据业务特点选择合适策略。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            import hashlib
            from datetime import datetime, timedelta
            
            # 策略1: 按时间分区
            class TimeBasedPartitioning:
                def __init__(self, collection):
                    self.collection = collection
                
                def create_monthly_partitions(self, start_date, num_months):
                    """创建按月分区"""
                    partitions = []
                    current_date = start_date
                    
                    for i in range(num_months):
                        partition_name = current_date.strftime("month_%Y_%m")
                        
                        if not self.collection.has_partition(partition_name):
                            partition = self.collection.create_partition(partition_name)
                            partitions.append(partition)
                            print(f"创建分区: {partition_name}")
                        
                        # 下一个月
                        if current_date.month == 12:
                            current_date = datetime(current_date.year + 1, 1, 1)
                        else:
                            current_date = datetime(current_date.year, current_date.month + 1, 1)
                    
                    return partitions
                
                def get_partition_by_date(self, date):
                    """根据日期获取分区名"""
                    return date.strftime("month_%Y_%m")
                
                def insert_with_date(self, data, date):
                    """插入数据到对应日期的分区"""
                    partition_name = self.get_partition_by_date(date)
                    
                    if not self.collection.has_partition(partition_name):
                        self.collection.create_partition(partition_name)
                    
                    partition = Partition(self.collection, partition_name)
                    partition.insert(data)
                    
                    print(f"数据插入到分区: {partition_name}")
            
            collection = Collection("time_series_docs")
            time_partitioner = TimeBasedPartitioning(collection)
            
            # 创建最近6个月的分区
            start_date = datetime(2024, 1, 1)
            time_partitioner.create_monthly_partitions(start_date, 6)
            
            # 策略2: 按类别分区
            class CategoryBasedPartitioning:
                def __init__(self, collection):
                    self.collection = collection
                    self.categories = {}
                
                def create_category_partitions(self, categories):
                    """为每个类别创建分区"""
                    for category in categories:
                        partition_name = f"cat_{category.lower().replace(' ', '_')}"
                        
                        if not self.collection.has_partition(partition_name):
                            partition = self.collection.create_partition(partition_name)
                            self.categories[category] = partition_name
                            print(f"创建分区: {partition_name}")
                
                def insert_by_category(self, data, category):
                    """插入数据到对应类别的分区"""
                    if category not in self.categories:
                        raise ValueError(f"未知类别: {category}")
                    
                    partition_name = self.categories[category]
                    partition = Partition(self.collection, partition_name)
                    partition.insert(data)
                    
                    print(f"数据插入到分区: {partition_name}")
            
            category_partitioner = CategoryBasedPartitioning(collection)
            categories = ["电子产品", "图书", "服装", "食品"]
            category_partitioner.create_category_partitions(categories)
            
            # 策略3: 按哈希分区
            class HashBasedPartitioning:
                def __init__(self, collection, num_partitions=10):
                    self.collection = collection
                    self.num_partitions = num_partitions
                    self.create_hash_partitions()
                
                def create_hash_partitions(self):
                    """创建哈希分区"""
                    for i in range(self.num_partitions):
                        partition_name = f"hash_{i:03d}"
                        
                        if not self.collection.has_partition(partition_name):
                            self.collection.create_partition(partition_name)
                            print(f"创建分区: {partition_name}")
                
                def get_partition_by_id(self, doc_id):
                    """根据ID计算分区"""
                    partition_idx = hash(str(doc_id)) % self.num_partitions
                    return f"hash_{partition_idx:03d}"
                
                def insert_by_hash(self, data):
                    """根据哈希分配数据到分区"""
                    # 假设data[0]是ID列表
                    ids = data[0]
                    
                    # 按分区分组数据
                    partition_data = {}
                    for i, doc_id in enumerate(ids):
                        partition_name = self.get_partition_by_id(doc_id)
                        
                        if partition_name not in partition_data:
                            partition_data[partition_name] = [[] for _ in range(len(data))]
                        
                        for j, field_data in enumerate(data):
                            partition_data[partition_name][j].append(field_data[i])
                    
                    # 插入到各分区
                    for partition_name, pdata in partition_data.items():
                        partition = Partition(self.collection, partition_name)
                        partition.insert(pdata)
                        print(f"插入 {len(pdata[0])} 条数据到 {partition_name}")
            
            hash_partitioner = HashBasedPartitioning(collection, num_partitions=10)
            
            # 分区策略选择
            print("\n分区策略选择建议:")
            print("  时间分区: 适合日志、时序数据,便于归档")
            print("  类别分区: 适合多租户、多类型数据")
            print("  哈希分区: 适合均匀分布,负载均衡")
            print("  混合分区: 先按类别再按时间,多级分区")
            ---

02.分区操作
    a.加载释放
        a.功能说明
            分区可以独立加载和释放,节省内存资源。只加载需要查询的分区,其他分区保持释放状态。加载分区会将索引和部分数据加载到内存。释放分区会释放内存,但数据仍保留在存储中。可以动态加载释放分区,适应查询模式变化。热数据分区保持加载,冷数据分区按需加载。合理管理分区加载状态可以优化内存使用。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建多个分区
            partitions = []
            for year in [2022, 2023, 2024]:
                partition_name = f"year_{year}"
                if not collection.has_partition(partition_name):
                    partition = collection.create_partition(partition_name)
                    partitions.append(partition)
                    
                    # 插入数据
                    data = [
                        [i for i in range(year*1000, year*1000+1000)],
                        [f"文档{year}_{i}" for i in range(1000)],
                        [[np.random.random() for _ in range(128)] for _ in range(1000)]
                    ]
                    partition.insert(data)
            
            collection.flush()
            
            # 加载特定分区
            print("加载特定分区:\n")
            
            partition_2024 = Partition(collection, "year_2024")
            
            print(f"分区状态: {partition_2024.is_loaded}")
            
            partition_2024.load()
            print(f"加载后状态: {partition_2024.is_loaded}")
            
            # 查询已加载分区
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                partition_names=["year_2024"]
            )
            
            print(f"\n查询year_2024分区: {len(results[0])} 条结果")
            
            # 释放分区
            partition_2024.release()
            print(f"\n释放后状态: {partition_2024.is_loaded}")
            
            # 动态加载管理
            class PartitionLoadManager:
                def __init__(self, collection):
                    self.collection = collection
                    self.loaded_partitions = set()
                
                def load_partition(self, partition_name):
                    """加载分区"""
                    if partition_name in self.loaded_partitions:
                        print(f"分区 {partition_name} 已加载")
                        return
                    
                    partition = Partition(self.collection, partition_name)
                    
                    start = time.time()
                    partition.load()
                    elapsed = time.time() - start
                    
                    self.loaded_partitions.add(partition_name)
                    print(f"加载分区 {partition_name}: {elapsed:.2f}s")
                
                def release_partition(self, partition_name):
                    """释放分区"""
                    if partition_name not in self.loaded_partitions:
                        print(f"分区 {partition_name} 未加载")
                        return
                    
                    partition = Partition(self.collection, partition_name)
                    partition.release()
                    
                    self.loaded_partitions.remove(partition_name)
                    print(f"释放分区 {partition_name}")
                
                def load_partitions(self, partition_names):
                    """批量加载分区"""
                    for name in partition_names:
                        self.load_partition(name)
                
                def release_all(self):
                    """释放所有分区"""
                    for name in list(self.loaded_partitions):
                        self.release_partition(name)
                
                def get_loaded_partitions(self):
                    """获取已加载分区列表"""
                    return list(self.loaded_partitions)
            
            # 使用加载管理器
            load_manager = PartitionLoadManager(collection)
            
            print("\n动态加载管理:")
            
            # 加载热数据分区
            load_manager.load_partitions(["year_2024", "year_2023"])
            
            print(f"已加载分区: {load_manager.get_loaded_partitions()}")
            
            # 查询热数据
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                partition_names=["year_2024", "year_2023"]
            )
            
            print(f"查询热数据: {len(results[0])} 条结果")
            
            # 切换到冷数据
            load_manager.release_partition("year_2023")
            load_manager.load_partition("year_2022")
            
            print(f"切换后已加载分区: {load_manager.get_loaded_partitions()}")
            
            # 释放所有
            load_manager.release_all()
            print(f"释放后已加载分区: {load_manager.get_loaded_partitions()}")
            
            # 内存优化建议
            print("\n内存优化建议:")
            print("  1. 只加载近期数据分区(如最近3个月)")
            print("  2. 历史数据按需加载,查询后释放")
            print("  3. 监控内存使用,避免加载过多分区")
            print("  4. 使用LRU策略自动管理分区加载")
            print("  5. 考虑分区大小,避免单个分区过大")
            ---
    b.删除分区
        a.功能说明
            删除分区会永久删除分区及其所有数据。删除前需要先释放分区。删除操作不可逆,需要谨慎操作。可以用于清理过期数据,如删除旧的时间分区。删除分区可以释放存储空间。建议在删除前备份重要数据。删除分区不影响其他分区的数据和查询。
        b.代码示例
            ---
            from pymilvus import Collection, Partition
            import numpy as np
            
            collection = Collection("documents")
            
            # 创建测试分区
            test_partition = collection.create_partition("test_partition")
            
            # 插入测试数据
            data = [
                [i for i in range(10000, 11000)],
                [f"测试文档_{i}" for i in range(1000)],
                [[np.random.random() for _ in range(128)] for _ in range(1000)]
            ]
            test_partition.insert(data)
            collection.flush()
            
            print(f"创建测试分区: test_partition")
            print(f"数据量: {test_partition.num_entities} 条")
            
            # 列出所有分区
            print(f"\n当前分区:")
            for partition in collection.partitions:
                print(f"  - {partition.name}: {partition.num_entities} 条")
            
            # 删除分区
            print(f"\n删除test_partition分区...")
            
            # 先释放(如果已加载)
            if test_partition.is_loaded:
                test_partition.release()
            
            # 删除分区
            collection.drop_partition("test_partition")
            
            print(f"删除完成")
            
            # 验证删除
            print(f"\n删除后分区:")
            for partition in collection.partitions:
                print(f"  - {partition.name}: {partition.num_entities} 条")
            
            # 批量删除旧分区
            class PartitionCleaner:
                def __init__(self, collection):
                    self.collection = collection
                
                def delete_old_time_partitions(self, keep_months=3):
                    """删除旧的时间分区,保留最近N个月"""
                    from datetime import datetime, timedelta
                    
                    cutoff_date = datetime.now() - timedelta(days=keep_months*30)
                    
                    deleted_partitions = []
                    
                    for partition in self.collection.partitions:
                        # 跳过默认分区
                        if partition.name == "_default":
                            continue
                        
                        # 解析分区名(假设格式为month_YYYY_MM)
                        if partition.name.startswith("month_"):
                            try:
                                parts = partition.name.split("_")
                                year = int(parts[1])
                                month = int(parts[2])
                                partition_date = datetime(year, month, 1)
                                
                                if partition_date < cutoff_date:
                                    # 释放并删除
                                    if partition.is_loaded:
                                        partition.release()
                                    
                                    self.collection.drop_partition(partition.name)
                                    deleted_partitions.append(partition.name)
                                    print(f"删除旧分区: {partition.name}")
                            except Exception as e:
                                print(f"解析分区名失败: {partition.name}, {e}")
                    
                    return deleted_partitions
                
                def delete_empty_partitions(self):
                    """删除空分区"""
                    deleted_partitions = []
                    
                    for partition in self.collection.partitions:
                        if partition.name == "_default":
                            continue
                        
                        if partition.num_entities == 0:
                            if partition.is_loaded:
                                partition.release()
                            
                            self.collection.drop_partition(partition.name)
                            deleted_partitions.append(partition.name)
                            print(f"删除空分区: {partition.name}")
                    
                    return deleted_partitions
                
                def safe_delete_partition(self, partition_name, backup_path=None):
                    """安全删除分区(可选备份)"""
                    partition = Partition(self.collection, partition_name)
                    
                    # 备份数据
                    if backup_path:
                        print(f"备份分区 {partition_name} 到 {backup_path}")
                        # 这里应该实现实际的备份逻辑
                        # 例如导出数据到文件
                    
                    # 释放并删除
                    if partition.is_loaded:
                        partition.release()
                    
                    self.collection.drop_partition(partition_name)
                    print(f"删除分区: {partition_name}")
            
            cleaner = PartitionCleaner(collection)
            
            # 删除旧分区
            print("\n清理旧分区(保留最近3个月):")
            deleted = cleaner.delete_old_time_partitions(keep_months=3)
            print(f"删除了 {len(deleted)} 个旧分区")
            
            # 删除空分区
            print("\n清理空分区:")
            deleted = cleaner.delete_empty_partitions()
            print(f"删除了 {len(deleted)} 个空分区")
            
            # 删除注意事项
            print("\n删除分区注意事项:")
            print("  1. 删除操作不可逆,务必谨慎")
            print("  2. 删除前建议备份重要数据")
            print("  3. 先释放分区再删除")
            print("  4. 不能删除_default分区")
            print("  5. 定期清理过期分区释放存储")
            ---

7.2 副本配置

01.副本机制
    a.副本作用
        a.功能说明
            副本机制提供数据冗余和高可用性,提升查询吞吐量。每个副本包含完整的数据和索引副本。多个副本可以并行处理查询请求,提升QPS。副本之间数据保持一致,自动同步更新。副本数量可以动态调整,适应负载变化。适合读多写少的场景,如搜索推荐系统。副本会占用额外的内存和存储资源。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 查看当前副本配置
            print("当前副本配置:")
            replicas = collection.get_replicas()
            print(f"  副本数量: {len(replicas.groups)}")
            
            for i, replica in enumerate(replicas.groups):
                print(f"\n  副本 {i+1}:")
                print(f"    副本ID: {replica.id}")
                print(f"    分片数: {len(replica.shards)}")
                print(f"    节点: {replica.resource_group}")
            
            # 创建副本
            print("\n创建副本...")
            collection.load(replica_number=3)
            
            replicas = collection.get_replicas()
            print(f"创建后副本数量: {len(replicas.groups)}")
            
            # 测试副本对查询性能的影响
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 单副本性能
            collection.release()
            collection.load(replica_number=1)
            
            start = time.time()
            for _ in range(100):
                collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            time_single = time.time() - start
            
            qps_single = 100 / time_single
            print(f"\n单副本性能:")
            print(f"  查询时间: {time_single:.2f}s")
            print(f"  QPS: {qps_single:.2f}")
            
            # 多副本性能
            collection.release()
            collection.load(replica_number=3)
            
            start = time.time()
            for _ in range(100):
                collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            time_multi = time.time() - start
            
            qps_multi = 100 / time_multi
            print(f"\n三副本性能:")
            print(f"  查询时间: {time_multi:.2f}s")
            print(f"  QPS: {qps_multi:.2f}")
            print(f"  提升: {qps_multi/qps_single:.2f}x")
            
            # 副本配置建议
            print("\n副本配置建议:")
            print("  1. 读多写少: 使用2-3个副本")
            print("  2. 高可用: 至少2个副本")
            print("  3. 高吞吐: 3-5个副本")
            print("  4. 资源有限: 1个副本")
            print("  5. 副本数 ≤ QueryNode数量")
            ---
    b.副本管理
        a.功能说明
            副本管理包括创建、调整、监控副本。可以动态调整副本数量,无需停机。副本数量影响内存使用和查询性能。需要监控副本状态,确保所有副本正常工作。副本故障会自动切换到其他副本。可以为不同Collection配置不同副本数。合理配置副本可以平衡性能和成本。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 副本管理类
            class ReplicaManager:
                def __init__(self, collection):
                    self.collection = collection
                
                def get_replica_info(self):
                    """获取副本信息"""
                    if not self.collection.is_loaded:
                        return {"loaded": False}
                    
                    replicas = self.collection.get_replicas()
                    
                    info = {
                        "loaded": True,
                        "replica_count": len(replicas.groups),
                        "replicas": []
                    }
                    
                    for replica in replicas.groups:
                        replica_info = {
                            "id": replica.id,
                            "shard_count": len(replica.shards),
                            "resource_group": replica.resource_group
                        }
                        info["replicas"].append(replica_info)
                    
                    return info
                
                def set_replica_number(self, replica_number):
                    """设置副本数量"""
                    print(f"设置副本数量为 {replica_number}...")
                    
                    # 释放并重新加载
                    self.collection.release()
                    self.collection.load(replica_number=replica_number)
                    
                    # 等待加载完成
                    while not self.collection.is_loaded:
                        time.sleep(0.1)
                    
                    info = self.get_replica_info()
                    print(f"当前副本数量: {info['replica_count']}")
                    
                    return info
                
                def scale_replicas(self, target_replica_number):
                    """扩缩容副本"""
                    current_info = self.get_replica_info()
                    
                    if not current_info["loaded"]:
                        print("Collection未加载,直接加载指定副本数")
                        return self.set_replica_number(target_replica_number)
                    
                    current_count = current_info["replica_count"]
                    
                    if current_count == target_replica_number:
                        print(f"副本数量已经是 {target_replica_number}")
                        return current_info
                    
                    if current_count < target_replica_number:
                        print(f"扩容: {current_count} -> {target_replica_number}")
                    else:
                        print(f"缩容: {current_count} -> {target_replica_number}")
                    
                    return self.set_replica_number(target_replica_number)
                
                def monitor_replicas(self):
                    """监控副本状态"""
                    info = self.get_replica_info()
                    
                    if not info["loaded"]:
                        print("Collection未加载")
                        return
                    
                    print(f"\n副本监控:")
                    print(f"  副本总数: {info['replica_count']}")
                    
                    for i, replica in enumerate(info["replicas"]):
                        print(f"\n  副本 {i+1}:")
                        print(f"    ID: {replica['id']}")
                        print(f"    分片数: {replica['shard_count']}")
                        print(f"    资源组: {replica['resource_group']}")
                
                def benchmark_replicas(self, num_queries=100):
                    """测试不同副本数的性能"""
                    replica_numbers = [1, 2, 3]
                    results = []
                    
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    print(f"\n副本性能测试 ({num_queries} 次查询):\n")
                    print(f"{'副本数':>8s} {'总时间':>10s} {'QPS':>10s} {'平均延迟':>12s}")
                    print("-" * 45)
                    
                    for replica_num in replica_numbers:
                        self.set_replica_number(replica_num)
                        
                        start = time.time()
                        for _ in range(num_queries):
                            self.collection.search(
                                data=query_vector,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                        elapsed = time.time() - start
                        
                        qps = num_queries / elapsed
                        avg_latency = elapsed / num_queries * 1000
                        
                        results.append({
                            "replica_number": replica_num,
                            "total_time": elapsed,
                            "qps": qps,
                            "avg_latency": avg_latency
                        })
                        
                        print(f"{replica_num:8d} {elapsed:9.2f}s {qps:9.2f} {avg_latency:10.2f}ms")
                    
                    return results
            
            # 使用副本管理器
            manager = ReplicaManager(collection)
            
            # 获取当前副本信息
            info = manager.get_replica_info()
            print(f"当前副本信息: {info}")
            
            # 设置副本数量
            manager.set_replica_number(2)
            
            # 监控副本
            manager.monitor_replicas()
            
            # 扩容副本
            manager.scale_replicas(3)
            
            # 性能测试
            results = manager.benchmark_replicas(num_queries=50)
            
            # 找到最优配置
            best_result = max(results, key=lambda x: x["qps"])
            print(f"\n最优配置:")
            print(f"  副本数: {best_result['replica_number']}")
            print(f"  QPS: {best_result['qps']:.2f}")
            ---

02.高可用配置
    a.故障切换
        a.功能说明
            副本提供自动故障切换能力,提升系统可用性。当某个副本节点故障时,查询自动路由到其他副本。故障切换对客户端透明,无需手动干预。多副本配置可以实现零停机维护。建议至少配置2个副本保证高可用。副本分布在不同节点,避免单点故障。监控副本健康状态,及时发现问题。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from threading import Thread
            
            collection = Collection("documents")
            
            # 高可用配置类
            class HighAvailabilityConfig:
                def __init__(self, collection, min_replicas=2):
                    self.collection = collection
                    self.min_replicas = min_replicas
                    self.query_count = 0
                    self.error_count = 0
                
                def ensure_high_availability(self):
                    """确保高可用配置"""
                    if not self.collection.is_loaded:
                        print(f"加载Collection,副本数: {self.min_replicas}")
                        self.collection.load(replica_number=self.min_replicas)
                        return
                    
                    replicas = self.collection.get_replicas()
                    current_replicas = len(replicas.groups)
                    
                    if current_replicas < self.min_replicas:
                        print(f"副本数不足 ({current_replicas} < {self.min_replicas}),重新加载")
                        self.collection.release()
                        self.collection.load(replica_number=self.min_replicas)
                    else:
                        print(f"副本配置正常: {current_replicas} 个副本")
                
                def query_with_retry(self, query_vector, search_params, limit=10, max_retries=3):
                    """带重试的查询"""
                    for attempt in range(max_retries):
                        try:
                            results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=limit
                            )
                            
                            self.query_count += 1
                            return results[0]
                        
                        except Exception as e:
                            self.error_count += 1
                            print(f"查询失败 (尝试 {attempt+1}/{max_retries}): {e}")
                            
                            if attempt < max_retries - 1:
                                time.sleep(0.1 * (attempt + 1))  # 指数退避
                            else:
                                raise
                
                def health_check(self):
                    """健康检查"""
                    try:
                        replicas = self.collection.get_replicas()
                        replica_count = len(replicas.groups)
                        
                        health_status = {
                            "healthy": replica_count >= self.min_replicas,
                            "replica_count": replica_count,
                            "min_replicas": self.min_replicas,
                            "query_count": self.query_count,
                            "error_count": self.error_count,
                            "error_rate": self.error_count / self.query_count if self.query_count > 0 else 0
                        }
                        
                        return health_status
                    
                    except Exception as e:
                        return {
                            "healthy": False,
                            "error": str(e)
                        }
                
                def start_health_monitor(self, interval=10):
                    """启动健康监控"""
                    def monitor():
                        while True:
                            status = self.health_check()
                            
                            print(f"\n健康检查:")
                            print(f"  状态: {'健康' if status.get('healthy') else '异常'}")
                            print(f"  副本数: {status.get('replica_count', 'N/A')}")
                            print(f"  查询数: {status.get('query_count', 0)}")
                            print(f"  错误数: {status.get('error_count', 0)}")
                            print(f"  错误率: {status.get('error_rate', 0)*100:.2f}%")
                            
                            if not status.get('healthy'):
                                print("  警告: 副本数不足,尝试恢复...")
                                self.ensure_high_availability()
                            
                            time.sleep(interval)
                    
                    monitor_thread = Thread(target=monitor, daemon=True)
                    monitor_thread.start()
                    
                    return monitor_thread
            
            # 使用高可用配置
            ha_config = HighAvailabilityConfig(collection, min_replicas=2)
            
            # 确保高可用
            ha_config.ensure_high_availability()
            
            # 带重试的查询
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            print("\n执行查询(带重试):")
            results = ha_config.query_with_retry(query_vector, search_params, limit=10)
            print(f"查询成功: {len(results)} 条结果")
            
            # 健康检查
            status = ha_config.health_check()
            print(f"\n健康状态: {status}")
            
            # 故障模拟测试
            print("\n故障切换测试:")
            print("  模拟副本故障...")
            
            # 这里应该模拟实际的副本故障
            # 在生产环境中,Milvus会自动处理故障切换
            
            print("  查询继续执行...")
            for i in range(10):
                try:
                    results = ha_config.query_with_retry(query_vector, search_params)
                    print(f"  查询 {i+1}: 成功")
                except Exception as e:
                    print(f"  查询 {i+1}: 失败 - {e}")
            
            final_status = ha_config.health_check()
            print(f"\n最终状态:")
            print(f"  总查询数: {final_status['query_count']}")
            print(f"  错误数: {final_status['error_count']}")
            print(f"  成功率: {(1-final_status['error_rate'])*100:.2f}%")
            ---
    b.负载均衡
        a.功能说明
            多副本自动实现负载均衡,查询请求分散到不同副本。Milvus使用轮询策略分配查询到副本。负载均衡提升系统整体吞吐量和响应速度。可以根据副本负载动态调整查询分配。监控各副本的负载情况,确保均衡分布。副本数量应该与QueryNode数量匹配。合理配置可以充分利用集群资源。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from collections import defaultdict
            import concurrent.futures
            
            collection = Collection("documents")
            collection.load(replica_number=3)
            
            # 负载均衡监控
            class LoadBalancingMonitor:
                def __init__(self, collection):
                    self.collection = collection
                    self.query_stats = defaultdict(int)
                    self.latency_stats = defaultdict(list)
                
                def query(self, query_vector, search_params, limit=10):
                    """执行查询并记录统计"""
                    start = time.time()
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit
                    )
                    
                    latency = time.time() - start
                    
                    # 记录统计(这里简化,实际应该获取实际处理的副本ID)
                    replica_id = hash(time.time()) % 3  # 模拟副本ID
                    self.query_stats[replica_id] += 1
                    self.latency_stats[replica_id].append(latency)
                    
                    return results[0]
                
                def get_load_distribution(self):
                    """获取负载分布"""
                    total_queries = sum(self.query_stats.values())
                    
                    distribution = {}
                    for replica_id, count in self.query_stats.items():
                        avg_latency = np.mean(self.latency_stats[replica_id]) if self.latency_stats[replica_id] else 0
                        
                        distribution[replica_id] = {
                            "query_count": count,
                            "percentage": count / total_queries * 100 if total_queries > 0 else 0,
                            "avg_latency": avg_latency * 1000  # ms
                        }
                    
                    return distribution
                
                def print_load_stats(self):
                    """打印负载统计"""
                    distribution = self.get_load_distribution()
                    
                    print("\n负载分布:")
                    print(f"{'副本ID':>10s} {'查询数':>10s} {'占比':>10s} {'平均延迟':>12s}")
                    print("-" * 48)
                    
                    for replica_id, stats in sorted(distribution.items()):
                        print(f"{replica_id:10d} {stats['query_count']:10d} {stats['percentage']:9.1f}% {stats['avg_latency']:10.2f}ms")
                
                def check_balance(self, threshold=0.2):
                    """检查负载是否均衡"""
                    distribution = self.get_load_distribution()
                    
                    if len(distribution) < 2:
                        return True, "副本数不足,无法判断"
                    
                    percentages = [stats["percentage"] for stats in distribution.values()]
                    avg_percentage = np.mean(percentages)
                    max_deviation = max(abs(p - avg_percentage) for p in percentages)
                    
                    is_balanced = max_deviation <= threshold * 100
                    
                    return is_balanced, f"最大偏差: {max_deviation:.1f}%"
            
            # 使用负载均衡监控
            monitor = LoadBalancingMonitor(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 执行大量查询
            print("执行负载测试...")
            for i in range(300):
                monitor.query(query_vector, search_params)
                if (i + 1) % 100 == 0:
                    print(f"  已完成 {i+1} 次查询")
            
            # 打印负载统计
            monitor.print_load_stats()
            
            # 检查负载均衡
            is_balanced, message = monitor.check_balance(threshold=0.2)
            print(f"\n负载均衡检查: {'通过' if is_balanced else '不通过'}")
            print(f"  {message}")
            
            # 并发负载测试
            print("\n并发负载测试:")
            
            def concurrent_query(monitor, query_vector, search_params):
                """并发查询任务"""
                return monitor.query(query_vector, search_params)
            
            concurrent_monitor = LoadBalancingMonitor(collection)
            
            num_concurrent = 50
            num_queries_per_thread = 10
            
            with concurrent.futures.ThreadPoolExecutor(max_workers=num_concurrent) as executor:
                futures = []
                for _ in range(num_concurrent * num_queries_per_thread):
                    future = executor.submit(concurrent_query, concurrent_monitor, query_vector, search_params)
                    futures.append(future)
                
                # 等待完成
                concurrent.futures.wait(futures)
            
            print(f"完成 {num_concurrent * num_queries_per_thread} 次并发查询")
            
            concurrent_monitor.print_load_stats()
            
            is_balanced, message = concurrent_monitor.check_balance(threshold=0.2)
            print(f"\n并发负载均衡检查: {'通过' if is_balanced else '不通过'}")
            print(f"  {message}")
            
            # 负载均衡建议
            print("\n负载均衡建议:")
            print("  1. 副本数 = QueryNode数,充分利用资源")
            print("  2. 监控各副本负载,确保均衡")
            print("  3. 副本分布在不同节点,避免热点")
            print("  4. 使用资源组隔离不同业务")
            print("  5. 定期检查负载分布,及时调整")
            ---

7.3 动态Schema

01.动态字段
    a.启用动态Schema
        a.功能说明
            动态Schema允许插入未在Schema中定义的字段,提供灵活性。启用后可以在插入数据时添加任意JSON字段。动态字段存储在特殊的$meta字段中。可以查询和过滤动态字段,但不能为其创建索引。适合字段不固定的场景,如用户自定义属性、元数据等。动态字段会略微影响性能。需要在创建Collection时启用。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建启用动态Schema的Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
            ]
            
            schema = CollectionSchema(
                fields=fields,
                description="动态Schema示例",
                enable_dynamic_field=True  # 启用动态字段
            )
            
            collection = Collection("dynamic_collection", schema=schema)
            
            print(f"动态Schema已启用: {schema.enable_dynamic_field}")
            
            # 插入带动态字段的数据
            data = [
                [1, 2, 3, 4, 5],  # ids
                [[np.random.random() for _ in range(128)] for _ in range(5)],  # embeddings
                # 动态字段
                [
                    {"title": "文档1", "category": "技术", "tags": ["AI", "ML"]},
                    {"title": "文档2", "author": "张三", "rating": 4.5},
                    {"title": "文档3", "category": "科学", "year": 2024},
                    {"title": "文档4", "price": 99.99, "stock": 100},
                    {"title": "文档5", "description": "这是一个测试文档"}
                ]
            ]
            
            collection.insert(data)
            collection.flush()
            
            print(f"\n插入数据: {collection.num_entities} 条")
            
            # 创建索引并加载
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # 查询动态字段
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                output_fields=["id", "title", "category", "author"]  # 包含动态字段
            )
            
            print("\n查询结果(包含动态字段):")
            for hit in results[0]:
                print(f"  ID: {hit.id}")
                print(f"  标题: {hit.entity.get('title')}")
                print(f"  类别: {hit.entity.get('category')}")
                print(f"  作者: {hit.entity.get('author')}")
                print()
            
            # 过滤动态字段
            results_filtered = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=5,
                expr='category == "技术"',  # 过滤动态字段
                output_fields=["id", "title", "category"]
            )
            
            print("过滤动态字段(category == '技术'):")
            for hit in results_filtered[0]:
                print(f"  {hit.entity.get('title')}: {hit.entity.get('category')}")
            ---
    b.动态字段管理
        a.功能说明
            动态字段管理需要注意数据一致性和查询性能。不同记录可以有不同的动态字段。动态字段不支持索引,过滤性能较差。建议将常用字段定义在Schema中。动态字段适合低频查询的元数据。可以通过output_fields指定返回的动态字段。需要处理字段缺失的情况。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("dynamic_collection")
            collection.load()
            
            # 动态字段管理类
            class DynamicFieldManager:
                def __init__(self, collection):
                    self.collection = collection
                    self.field_usage = {}
                
                def insert_with_dynamic_fields(self, ids, embeddings, dynamic_data):
                    """插入带动态字段的数据"""
                    # 统计字段使用情况
                    for record in dynamic_data:
                        for field_name in record.keys():
                            self.field_usage[field_name] = self.field_usage.get(field_name, 0) + 1
                    
                    data = [ids, embeddings, dynamic_data]
                    self.collection.insert(data)
                    self.collection.flush()
                
                def query_dynamic_fields(self, query_vector, search_params, fields=None):
                    """查询动态字段"""
                    # 如果未指定字段,返回所有常用字段
                    if fields is None:
                        fields = self.get_common_fields(threshold=0.5)
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=10,
                        output_fields=["id"] + fields
                    )
                    
                    return results[0]
                
                def get_common_fields(self, threshold=0.5):
                    """获取常用动态字段(出现频率 > threshold)"""
                    total_records = self.collection.num_entities
                    common_fields = []
                    
                    for field_name, count in self.field_usage.items():
                        if count / total_records >= threshold:
                            common_fields.append(field_name)
                    
                    return common_fields
                
                def get_field_statistics(self):
                    """获取字段统计信息"""
                    total_records = self.collection.num_entities
                    
                    stats = {}
                    for field_name, count in self.field_usage.items():
                        stats[field_name] = {
                            "count": count,
                            "coverage": count / total_records * 100 if total_records > 0 else 0
                        }
                    
                    return stats
                
                def recommend_schema_fields(self, threshold=0.8):
                    """推荐应该加入Schema的字段"""
                    stats = self.get_field_statistics()
                    recommendations = []
                    
                    for field_name, stat in stats.items():
                        if stat["coverage"] >= threshold * 100:
                            recommendations.append({
                                "field": field_name,
                                "coverage": stat["coverage"],
                                "reason": f"字段覆盖率 {stat['coverage']:.1f}%,建议加入Schema并创建索引"
                            })
                    
                    return recommendations
            
            # 使用动态字段管理器
            manager = DynamicFieldManager(collection)
            
            # 插入更多数据
            new_ids = [10, 11, 12, 13, 14]
            new_embeddings = [[np.random.random() for _ in range(128)] for _ in range(5)]
            new_dynamic_data = [
                {"title": "文档10", "category": "技术", "views": 1000},
                {"title": "文档11", "category": "科学", "views": 500},
                {"title": "文档12", "category": "技术", "views": 800},
                {"title": "文档13", "category": "艺术", "views": 300},
                {"title": "文档14", "category": "技术", "views": 1200}
            ]
            
            manager.insert_with_dynamic_fields(new_ids, new_embeddings, new_dynamic_data)
            
            # 获取字段统计
            stats = manager.get_field_statistics()
            print("\n动态字段统计:")
            for field_name, stat in sorted(stats.items(), key=lambda x: x[1]["coverage"], reverse=True):
                print(f"  {field_name}: {stat['count']} 次, 覆盖率 {stat['coverage']:.1f}%")
            
            # 获取常用字段
            common_fields = manager.get_common_fields(threshold=0.5)
            print(f"\n常用字段 (覆盖率 > 50%): {common_fields}")
            
            # 推荐Schema字段
            recommendations = manager.recommend_schema_fields(threshold=0.8)
            if recommendations:
                print("\nSchema优化建议:")
                for rec in recommendations:
                    print(f"  {rec['field']}: {rec['reason']}")
            
            # 查询动态字段
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            results = manager.query_dynamic_fields(query_vector, search_params, fields=["title", "category", "views"])
            
            print("\n查询结果:")
            for hit in results[:5]:
                print(f"  {hit.entity.get('title')}: {hit.entity.get('category')}, 浏览 {hit.entity.get('views', 'N/A')}")
            ---

7.4 时间旅行

01.时间旅行概念
    a.时间戳机制
        a.功能说明
            时间旅行允许查询历史数据状态,基于时间戳实现。Milvus为每个操作分配时间戳,记录数据变更历史。可以指定时间点查询该时刻的数据状态。适合审计、回溯分析、版本对比等场景。时间旅行不影响当前数据,只是查询视图。历史数据保留时间由配置决定,默认保留一段时间。超过保留期的历史数据会被清理。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 获取当前时间戳
            current_ts = utility.mkts_from_unixtime(time.time())
            print(f"当前时间戳: {current_ts}")
            
            # 插入初始数据
            initial_data = [
                [1, 2, 3],
                [f"文档{i}_v1" for i in [1, 2, 3]],
                [[np.random.random() for _ in range(128)] for _ in range(3)]
            ]
            
            collection.insert(initial_data)
            collection.flush()
            
            ts_after_insert = utility.mkts_from_unixtime(time.time())
            print(f"插入后时间戳: {ts_after_insert}")
            
            # 等待一段时间
            time.sleep(2)
            
            # 更新数据(通过删除和重新插入)
            collection.delete(expr="id in [1, 2]")
            
            update_data = [
                [1, 2],
                [f"文档{i}_v2" for i in [1, 2]],
                [[np.random.random() for _ in range(128)] for _ in range(2)]
            ]
            
            collection.insert(update_data)
            collection.flush()
            
            ts_after_update = utility.mkts_from_unixtime(time.time())
            print(f"更新后时间戳: {ts_after_update}")
            
            # 查询当前状态
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            results_current = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                output_fields=["id", "title"]
            )
            
            print("\n当前状态查询:")
            for hit in results_current[0]:
                print(f"  ID: {hit.id}, 标题: {hit.entity.get('title')}")
            
            # 时间旅行:查询插入后、更新前的状态
            results_past = collection.search(
                data=query_vector,
                anns_field="embedding",
                param=search_params,
                limit=10,
                travel_timestamp=ts_after_insert,  # 指定历史时间点
                output_fields=["id", "title"]
            )
            
            print(f"\n历史状态查询(时间戳: {ts_after_insert}):")
            for hit in results_past[0]:
                print(f"  ID: {hit.id}, 标题: {hit.entity.get('title')}")
            
            # 时间戳转换
            print("\n时间戳转换:")
            unix_time = time.time()
            milvus_ts = utility.mkts_from_unixtime(unix_time)
            print(f"  Unix时间: {unix_time}")
            print(f"  Milvus时间戳: {milvus_ts}")
            
            # 从时间戳转回Unix时间
            # Milvus时间戳是纳秒级,Unix时间是秒级
            unix_time_back = milvus_ts / 1000000000
            print(f"  转回Unix时间: {unix_time_back}")
            ---
    b.历史查询
        a.功能说明
            历史查询允许访问特定时间点的数据状态。通过travel_timestamp参数指定查询时间点。可以对比不同时间点的数据变化。适合数据审计、错误恢复、A/B测试等场景。历史查询性能与当前查询相当。需要注意历史数据保留策略。超过保留期的数据无法查询。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            from datetime import datetime
            
            collection = Collection("documents")
            collection.load()
            
            # 历史查询管理类
            class TimeTravelManager:
                def __init__(self, collection):
                    self.collection = collection
                    self.snapshots = {}
                
                def create_snapshot(self, name):
                    """创建快照"""
                    timestamp = utility.mkts_from_unixtime(time.time())
                    self.snapshots[name] = {
                        "timestamp": timestamp,
                        "unix_time": time.time(),
                        "datetime": datetime.now().isoformat()
                    }
                    print(f"创建快照: {name} (时间戳: {timestamp})")
                    return timestamp
                
                def list_snapshots(self):
                    """列出所有快照"""
                    print("\n快照列表:")
                    for name, info in self.snapshots.items():
                        print(f"  {name}:")
                        print(f"    时间戳: {info['timestamp']}")
                        print(f"    时间: {info['datetime']}")
                
                def query_at_snapshot(self, snapshot_name, query_vector, search_params, limit=10):
                    """在指定快照时间点查询"""
                    if snapshot_name not in self.snapshots:
                        raise ValueError(f"快照不存在: {snapshot_name}")
                    
                    timestamp = self.snapshots[snapshot_name]["timestamp"]
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        travel_timestamp=timestamp,
                        output_fields=["id", "title"]
                    )
                    
                    return results[0]
                
                def compare_snapshots(self, snapshot1, snapshot2, query_vector, search_params):
                    """对比两个快照的查询结果"""
                    results1 = self.query_at_snapshot(snapshot1, query_vector, search_params)
                    results2 = self.query_at_snapshot(snapshot2, query_vector, search_params)
                    
                    ids1 = set(hit.id for hit in results1)
                    ids2 = set(hit.id for hit in results2)
                    
                    added = ids2 - ids1
                    removed = ids1 - ids2
                    common = ids1 & ids2
                    
                    comparison = {
                        "snapshot1": snapshot1,
                        "snapshot2": snapshot2,
                        "added": list(added),
                        "removed": list(removed),
                        "common": list(common)
                    }
                    
                    return comparison
                
                def rollback_view(self, snapshot_name):
                    """回滚到指定快照(只是查询视图,不修改数据)"""
                    if snapshot_name not in self.snapshots:
                        raise ValueError(f"快照不存在: {snapshot_name}")
                    
                    timestamp = self.snapshots[snapshot_name]["timestamp"]
                    
                    print(f"\n回滚视图到快照: {snapshot_name}")
                    print(f"  时间戳: {timestamp}")
                    print(f"  时间: {self.snapshots[snapshot_name]['datetime']}")
                    
                    return timestamp
            
            # 使用时间旅行管理器
            tt_manager = TimeTravelManager(collection)
            
            # 创建初始快照
            tt_manager.create_snapshot("initial")
            
            # 插入数据
            data1 = [
                [100, 101, 102],
                ["文档A", "文档B", "文档C"],
                [[np.random.random() for _ in range(128)] for _ in range(3)]
            ]
            collection.insert(data1)
            collection.flush()
            
            time.sleep(1)
            tt_manager.create_snapshot("after_insert_1")
            
            # 插入更多数据
            data2 = [
                [103, 104],
                ["文档D", "文档E"],
                [[np.random.random() for _ in range(128)] for _ in range(2)]
            ]
            collection.insert(data2)
            collection.flush()
            
            time.sleep(1)
            tt_manager.create_snapshot("after_insert_2")
            
            # 删除数据
            collection.delete(expr="id in [100, 101]")
            collection.flush()
            
            time.sleep(1)
            tt_manager.create_snapshot("after_delete")
            
            # 列出快照
            tt_manager.list_snapshots()
            
            # 查询不同时间点
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            print("\n不同时间点的查询结果:")
            
            for snapshot_name in ["initial", "after_insert_1", "after_insert_2", "after_delete"]:
                try:
                    results = tt_manager.query_at_snapshot(snapshot_name, query_vector, search_params, limit=10)
                    print(f"\n{snapshot_name}: {len(results)} 条结果")
                    for hit in results[:3]:
                        print(f"  ID: {hit.id}, 标题: {hit.entity.get('title')}")
                except Exception as e:
                    print(f"\n{snapshot_name}: 查询失败 - {e}")
            
            # 对比快照
            comparison = tt_manager.compare_snapshots("after_insert_1", "after_delete", query_vector, search_params)
            
            print(f"\n快照对比:")
            print(f"  新增ID: {comparison['added']}")
            print(f"  删除ID: {comparison['removed']}")
            print(f"  保留ID: {comparison['common']}")
            ---

02.应用场景
    a.数据审计
        a.功能说明
            时间旅行支持数据审计,追踪数据变更历史。可以查询任意时间点的数据状态,验证数据完整性。适合合规审计、安全审查等场景。可以对比不同时间点的数据差异。帮助定位数据异常和错误操作。支持数据恢复和回滚决策。需要配置足够的历史数据保留期。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            from datetime import datetime
            
            collection = Collection("documents")
            collection.load()
            
            # 数据审计类
            class DataAuditor:
                def __init__(self, collection):
                    self.collection = collection
                    self.audit_log = []
                
                def log_operation(self, operation, details):
                    """记录操作日志"""
                    timestamp = utility.mkts_from_unixtime(time.time())
                    
                    log_entry = {
                        "timestamp": timestamp,
                        "unix_time": time.time(),
                        "datetime": datetime.now().isoformat(),
                        "operation": operation,
                        "details": details
                    }
                    
                    self.audit_log.append(log_entry)
                    print(f"[审计] {operation}: {details}")
                    
                    return timestamp
                
                def insert_with_audit(self, data):
                    """带审计的插入"""
                    timestamp_before = self.log_operation("INSERT_START", f"{len(data[0])} 条记录")
                    
                    self.collection.insert(data)
                    self.collection.flush()
                    
                    timestamp_after = self.log_operation("INSERT_COMPLETE", f"{len(data[0])} 条记录")
                    
                    return timestamp_before, timestamp_after
                
                def delete_with_audit(self, expr):
                    """带审计的删除"""
                    timestamp_before = self.log_operation("DELETE_START", expr)
                    
                    # 先查询要删除的数据
                    # 这里简化,实际应该查询并记录
                    
                    self.collection.delete(expr)
                    self.collection.flush()
                    
                    timestamp_after = self.log_operation("DELETE_COMPLETE", expr)
                    
                    return timestamp_before, timestamp_after
                
                def verify_data_integrity(self, expected_count, timestamp=None):
                    """验证数据完整性"""
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
                    
                    search_kwargs = {
                        "data": query_vector,
                        "anns_field": "embedding",
                        "param": search_params,
                        "limit": 10000
                    }
                    
                    if timestamp:
                        search_kwargs["travel_timestamp"] = timestamp
                    
                    results = self.collection.search(**search_kwargs)
                    actual_count = len(results[0])
                    
                    is_valid = actual_count >= expected_count * 0.9  # 允许10%误差
                    
                    self.log_operation(
                        "INTEGRITY_CHECK",
                        f"预期: {expected_count}, 实际: {actual_count}, 结果: {'通过' if is_valid else '失败'}"
                    )
                    
                    return is_valid, actual_count
                
                def generate_audit_report(self):
                    """生成审计报告"""
                    print("\n" + "="*60)
                    print("数据审计报告")
                    print("="*60)
                    
                    print(f"\n总操作数: {len(self.audit_log)}")
                    
                    # 按操作类型统计
                    operation_counts = {}
                    for entry in self.audit_log:
                        op = entry["operation"]
                        operation_counts[op] = operation_counts.get(op, 0) + 1
                    
                    print("\n操作统计:")
                    for op, count in sorted(operation_counts.items()):
                        print(f"  {op}: {count} 次")
                    
                    print("\n操作时间线:")
                    for entry in self.audit_log:
                        print(f"  [{entry['datetime']}] {entry['operation']}: {entry['details']}")
                    
                    return {
                        "total_operations": len(self.audit_log),
                        "operation_counts": operation_counts,
                        "audit_log": self.audit_log
                    }
                
                def rollback_analysis(self, target_timestamp):
                    """回滚分析"""
                    print(f"\n回滚分析(目标时间戳: {target_timestamp}):")
                    
                    # 找到目标时间戳之后的操作
                    operations_to_rollback = [
                        entry for entry in self.audit_log
                        if entry["timestamp"] > target_timestamp
                    ]
                    
                    print(f"  需要回滚的操作数: {len(operations_to_rollback)}")
                    
                    for entry in operations_to_rollback:
                        print(f"    [{entry['datetime']}] {entry['operation']}: {entry['details']}")
                    
                    return operations_to_rollback
            
            # 使用数据审计器
            auditor = DataAuditor(collection)
            
            # 执行一系列操作
            print("执行审计操作:\n")
            
            # 插入数据
            data1 = [
                [200, 201, 202],
                ["审计文档A", "审计文档B", "审计文档C"],
                [[np.random.random() for _ in range(128)] for _ in range(3)]
            ]
            ts_insert1_before, ts_insert1_after = auditor.insert_with_audit(data1)
            
            time.sleep(1)
            
            # 验证完整性
            auditor.verify_data_integrity(expected_count=3, timestamp=ts_insert1_after)
            
            time.sleep(1)
            
            # 插入更多数据
            data2 = [
                [203, 204],
                ["审计文档D", "审计文档E"],
                [[np.random.random() for _ in range(128)] for _ in range(2)]
            ]
            ts_insert2_before, ts_insert2_after = auditor.insert_with_audit(data2)
            
            time.sleep(1)
            
            # 删除数据
            ts_delete_before, ts_delete_after = auditor.delete_with_audit("id in [200, 201]")
            
            time.sleep(1)
            
            # 验证完整性
            auditor.verify_data_integrity(expected_count=3)
            
            # 生成审计报告
            report = auditor.generate_audit_report()
            
            # 回滚分析
            auditor.rollback_analysis(target_timestamp=ts_insert1_after)
            
            print("\n审计应用场景:")
            print("  1. 合规审计: 追踪所有数据变更")
            print("  2. 安全审查: 发现异常操作")
            print("  3. 错误恢复: 定位问题时间点")
            print("  4. 数据验证: 验证数据完整性")
            print("  5. 回滚决策: 分析回滚影响")
            ---
    b.版本对比
        a.功能说明
            时间旅行支持版本对比,比较不同时间点的数据差异。可以对比数据内容、查询结果、统计指标等。适合A/B测试、算法对比、数据质量评估等场景。帮助理解数据演变过程。支持可视化版本差异。可以用于数据回归测试。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 版本对比类
            class VersionComparator:
                def __init__(self, collection):
                    self.collection = collection
                    self.versions = {}
                
                def create_version(self, version_name):
                    """创建版本"""
                    timestamp = utility.mkts_from_unixtime(time.time())
                    self.versions[version_name] = timestamp
                    print(f"创建版本: {version_name} (时间戳: {timestamp})")
                    return timestamp
                
                def compare_query_results(self, version1, version2, query_vector, search_params, limit=10):
                    """对比两个版本的查询结果"""
                    if version1 not in self.versions or version2 not in self.versions:
                        raise ValueError("版本不存在")
                    
                    # 查询版本1
                    results1 = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        travel_timestamp=self.versions[version1],
                        output_fields=["id", "title"]
                    )
                    
                    # 查询版本2
                    results2 = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        travel_timestamp=self.versions[version2],
                        output_fields=["id", "title"]
                    )
                    
                    # 对比结果
                    ids1 = [hit.id for hit in results1[0]]
                    ids2 = [hit.id for hit in results2[0]]
                    
                    comparison = {
                        "version1": version1,
                        "version2": version2,
                        "results1": ids1,
                        "results2": ids2,
                        "intersection": list(set(ids1) & set(ids2)),
                        "only_in_v1": list(set(ids1) - set(ids2)),
                        "only_in_v2": list(set(ids2) - set(ids1)),
                        "similarity": len(set(ids1) & set(ids2)) / max(len(ids1), len(ids2)) if max(len(ids1), len(ids2)) > 0 else 0
                    }
                    
                    return comparison
                
                def print_comparison(self, comparison):
                    """打印对比结果"""
                    print(f"\n版本对比: {comparison['version1']} vs {comparison['version2']}")
                    print(f"  相似度: {comparison['similarity']*100:.1f}%")
                    print(f"  共同结果: {len(comparison['intersection'])} 个")
                    print(f"  仅在{comparison['version1']}: {len(comparison['only_in_v1'])} 个")
                    print(f"  仅在{comparison['version2']}: {len(comparison['only_in_v2'])} 个")
                    
                    if comparison['only_in_v1']:
                        print(f"\n  仅在{comparison['version1']}的ID: {comparison['only_in_v1'][:5]}")
                    
                    if comparison['only_in_v2']:
                        print(f"  仅在{comparison['version2']}的ID: {comparison['only_in_v2'][:5]}")
            
            # 使用版本对比器
            comparator = VersionComparator(collection)
            
            # 创建版本
            comparator.create_version("v1.0")
            
            # 修改数据...
            time.sleep(1)
            
            comparator.create_version("v1.1")
            
            # 对比版本
            query_vector = [np.random.random() for _ in range(128)]
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            comparison = comparator.compare_query_results("v1.0", "v1.1", query_vector, search_params)
            comparator.print_comparison(comparison)
            
            print("\n版本对比应用:")
            print("  1. A/B测试: 对比不同算法效果")
            print("  2. 数据质量: 评估数据变更影响")
            print("  3. 回归测试: 验证系统升级")
            print("  4. 性能分析: 对比不同配置")
            ---

7.5 混合搜索Hybrid

01.混合搜索原理
    a.多路召回
        a.功能说明
            混合搜索结合多种检索方式,提升召回效果。支持向量搜索、全文搜索、标量过滤等多路召回。不同召回路径可以使用不同的权重。通过融合算法合并多路结果。适合复杂查询场景,如语义+关键词搜索。可以提升搜索准确率和用户满意度。需要合理设计融合策略。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 多路召回类
            class MultiRecallSearch:
                def __init__(self, collection):
                    self.collection = collection
                
                def vector_recall(self, query_vector, search_params, limit=50):
                    """向量召回"""
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit,
                        output_fields=["id", "title"]
                    )
                    
                    # 转换为字典格式
                    recall_results = {}
                    for hit in results[0]:
                        recall_results[hit.id] = {
                            "score": 1 / (1 + hit.distance),  # 距离转分数
                            "title": hit.entity.get("title"),
                            "source": "vector"
                        }
                    
                    return recall_results
                
                def keyword_recall(self, keywords, limit=50):
                    """关键词召回(通过标量过滤模拟)"""
                    # 构建关键词过滤表达式
                    keyword_expr = " or ".join([f'title like "%{kw}%"' for kw in keywords])
                    
                    # 使用随机向量进行搜索,主要依赖过滤
                    query_vector = [np.random.random() for _ in range(128)]
                    
                    try:
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": 16}},
                            limit=limit,
                            expr=keyword_expr,
                            output_fields=["id", "title"]
                        )
                        
                        recall_results = {}
                        for hit in results[0]:
                            # 计算关键词匹配分数
                            title = hit.entity.get("title", "")
                            match_count = sum(1 for kw in keywords if kw in title)
                            score = match_count / len(keywords) if keywords else 0
                            
                            recall_results[hit.id] = {
                                "score": score,
                                "title": title,
                                "source": "keyword"
                            }
                        
                        return recall_results
                    
                    except Exception as e:
                        print(f"关键词召回失败: {e}")
                        return {}
                
                def category_recall(self, category, limit=50):
                    """类别召回"""
                    query_vector = [np.random.random() for _ in range(128)]
                    
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=limit,
                        expr=f'category == "{category}"',
                        output_fields=["id", "title", "category"]
                    )
                    
                    recall_results = {}
                    for hit in results[0]:
                        recall_results[hit.id] = {
                            "score": 1.0,  # 类别匹配给固定分数
                            "title": hit.entity.get("title"),
                            "category": hit.entity.get("category"),
                            "source": "category"
                        }
                    
                    return recall_results
                
                def hybrid_recall(self, query_vector, keywords=None, category=None, weights=None):
                    """混合召回"""
                    if weights is None:
                        weights = {"vector": 0.6, "keyword": 0.3, "category": 0.1}
                    
                    all_results = {}
                    
                    # 向量召回
                    search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
                    vector_results = self.vector_recall(query_vector, search_params, limit=50)
                    
                    for doc_id, info in vector_results.items():
                        all_results[doc_id] = {
                            "scores": {"vector": info["score"]},
                            "title": info["title"],
                            "sources": ["vector"]
                        }
                    
                    # 关键词召回
                    if keywords:
                        keyword_results = self.keyword_recall(keywords, limit=50)
                        
                        for doc_id, info in keyword_results.items():
                            if doc_id in all_results:
                                all_results[doc_id]["scores"]["keyword"] = info["score"]
                                all_results[doc_id]["sources"].append("keyword")
                            else:
                                all_results[doc_id] = {
                                    "scores": {"keyword": info["score"]},
                                    "title": info["title"],
                                    "sources": ["keyword"]
                                }
                    
                    # 类别召回
                    if category:
                        category_results = self.category_recall(category, limit=50)
                        
                        for doc_id, info in category_results.items():
                            if doc_id in all_results:
                                all_results[doc_id]["scores"]["category"] = info["score"]
                                all_results[doc_id]["sources"].append("category")
                            else:
                                all_results[doc_id] = {
                                    "scores": {"category": info["score"]},
                                    "title": info["title"],
                                    "sources": ["category"]
                                }
                    
                    # 计算加权总分
                    for doc_id in all_results:
                        total_score = 0
                        for source, weight in weights.items():
                            if source in all_results[doc_id]["scores"]:
                                total_score += weight * all_results[doc_id]["scores"][source]
                        
                        all_results[doc_id]["total_score"] = total_score
                    
                    # 排序
                    sorted_results = sorted(
                        all_results.items(),
                        key=lambda x: x[1]["total_score"],
                        reverse=True
                    )
                    
                    return sorted_results[:20]
            
            # 使用多路召回
            multi_recall = MultiRecallSearch(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            keywords = ["技术", "AI"]
            category = "电子产品"
            
            print("混合召回搜索:\n")
            
            results = multi_recall.hybrid_recall(
                query_vector=query_vector,
                keywords=keywords,
                category=category,
                weights={"vector": 0.5, "keyword": 0.3, "category": 0.2}
            )
            
            print(f"{'排名':>4s} {'ID':>8s} {'总分':>8s} {'来源':>20s} {'标题':>30s}")
            print("-" * 75)
            
            for rank, (doc_id, info) in enumerate(results[:10], 1):
                sources = ", ".join(info["sources"])
                title = info["title"][:28] if info.get("title") else "N/A"
                print(f"{rank:4d} {doc_id:8d} {info['total_score']:8.3f} {sources:>20s} {title:>30s}")
            ---
    b.融合策略
        a.功能说明
            融合策略决定如何合并多路召回结果。常见策略包括加权平均、RRF、CombSUM等。权重设置影响不同召回路径的重要性。需要根据业务场景调整权重。可以使用机器学习优化融合参数。融合策略应该考虑结果的排序位置。需要实验确定最优融合方法。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            
            # 融合策略类
            class FusionStrategy:
                @staticmethod
                def weighted_sum(results_dict, weights):
                    """加权求和融合"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        weight = weights.get(source, 0)
                        
                        for doc_id, score in results.items():
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = 0
                            fused_scores[doc_id] += weight * score
                    
                    return fused_scores
                
                @staticmethod
                def rrf(results_dict, k=60):
                    """Reciprocal Rank Fusion"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        # 按分数排序获取排名
                        ranked = sorted(results.items(), key=lambda x: x[1], reverse=True)
                        
                        for rank, (doc_id, score) in enumerate(ranked):
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = 0
                            fused_scores[doc_id] += 1 / (k + rank + 1)
                    
                    return fused_scores
                
                @staticmethod
                def comb_sum(results_dict):
                    """CombSUM: 简单求和"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        for doc_id, score in results.items():
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = 0
                            fused_scores[doc_id] += score
                    
                    return fused_scores
                
                @staticmethod
                def comb_max(results_dict):
                    """CombMAX: 取最大值"""
                    fused_scores = {}
                    
                    for source, results in results_dict.items():
                        for doc_id, score in results.items():
                            if doc_id not in fused_scores:
                                fused_scores[doc_id] = score
                            else:
                                fused_scores[doc_id] = max(fused_scores[doc_id], score)
                    
                    return fused_scores
                
                @staticmethod
                def adaptive_fusion(results_dict, quality_scores):
                    """自适应融合:根据召回质量动态调整权重"""
                    # 归一化质量分数
                    total_quality = sum(quality_scores.values())
                    weights = {
                        source: quality / total_quality
                        for source, quality in quality_scores.items()
                    }
                    
                    return FusionStrategy.weighted_sum(results_dict, weights)
            
            # 测试不同融合策略
            print("融合策略对比:\n")
            
            # 模拟多路召回结果
            vector_results = {1: 0.9, 2: 0.8, 3: 0.7, 4: 0.6, 5: 0.5}
            keyword_results = {2: 0.95, 3: 0.85, 6: 0.75, 7: 0.65}
            category_results = {1: 1.0, 4: 1.0, 8: 1.0}
            
            results_dict = {
                "vector": vector_results,
                "keyword": keyword_results,
                "category": category_results
            }
            
            # 加权求和
            weights = {"vector": 0.5, "keyword": 0.3, "category": 0.2}
            weighted_scores = FusionStrategy.weighted_sum(results_dict, weights)
            
            print("加权求和融合:")
            for doc_id, score in sorted(weighted_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # RRF
            rrf_scores = FusionStrategy.rrf(results_dict, k=60)
            
            print("\nRRF融合:")
            for doc_id, score in sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # CombSUM
            combsum_scores = FusionStrategy.comb_sum(results_dict)
            
            print("\nCombSUM融合:")
            for doc_id, score in sorted(combsum_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # CombMAX
            combmax_scores = FusionStrategy.comb_max(results_dict)
            
            print("\nCombMAX融合:")
            for doc_id, score in sorted(combmax_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            # 自适应融合
            quality_scores = {"vector": 0.8, "keyword": 0.6, "category": 0.9}
            adaptive_scores = FusionStrategy.adaptive_fusion(results_dict, quality_scores)
            
            print("\n自适应融合:")
            for doc_id, score in sorted(adaptive_scores.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"  文档{doc_id}: {score:.3f}")
            
            print("\n融合策略选择建议:")
            print("  加权求和: 适合权重明确的场景")
            print("  RRF: 适合不同度量的结果融合")
            print("  CombSUM: 简单快速,适合相同度量")
            print("  CombMAX: 强调最佳匹配")
            print("  自适应: 根据召回质量动态调整")
            ---

02.应用实践
    a.语义+关键词搜索
        a.功能说明
            语义+关键词混合搜索结合向量语义理解和关键词精确匹配。向量搜索捕捉语义相似性,关键词搜索保证精确匹配。适合搜索引擎、文档检索等场景。可以提升搜索准确率和用户满意度。需要合理设置两者权重。关键词匹配可以作为硬性约束或软性加分。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            collection = Collection("documents")
            collection.load()
            
            # 语义+关键词搜索类
            class SemanticKeywordSearch:
                def __init__(self, collection):
                    self.collection = collection
                
                def search(self, query_text, query_vector, keywords=None, mode="soft"):
                    """
                    混合搜索
                    mode: "soft" (软约束,关键词加分) 或 "hard" (硬约束,必须包含关键词)
                    """
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    if mode == "hard" and keywords:
                        # 硬约束:必须包含关键词
                        keyword_expr = " or ".join([f'title like "%{kw}%"' for kw in keywords])
                        
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=20,
                            expr=keyword_expr,
                            output_fields=["id", "title"]
                        )
                        
                        return [(hit.id, hit.entity.get("title"), hit.distance) for hit in results[0]]
                    
                    else:
                        # 软约束:关键词加分
                        # 先进行向量搜索
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=50,
                            output_fields=["id", "title"]
                        )
                        
                        # 计算综合分数
                        scored_results = []
                        for hit in results[0]:
                            title = hit.entity.get("title", "")
                            
                            # 向量分数(距离转相似度)
                            vector_score = 1 / (1 + hit.distance)
                            
                            # 关键词匹配分数
                            keyword_score = 0
                            if keywords:
                                match_count = sum(1 for kw in keywords if kw in title)
                                keyword_score = match_count / len(keywords)
                            
                            # 综合分数(可调整权重)
                            total_score = 0.7 * vector_score + 0.3 * keyword_score
                            
                            scored_results.append((hit.id, title, total_score, vector_score, keyword_score))
                        
                        # 按综合分数排序
                        scored_results.sort(key=lambda x: x[2], reverse=True)
                        
                        return scored_results[:20]
            
            # 使用语义+关键词搜索
            sk_search = SemanticKeywordSearch(collection)
            
            query_text = "人工智能机器学习"
            query_vector = [np.random.random() for _ in range(128)]
            keywords = ["AI", "机器学习"]
            
            print("软约束模式(关键词加分):\n")
            results_soft = sk_search.search(query_text, query_vector, keywords, mode="soft")
            
            print(f"{'排名':>4s} {'ID':>8s} {'总分':>8s} {'向量分':>10s} {'关键词分':>10s} {'标题':>30s}")
            print("-" * 75)
            
            for rank, (doc_id, title, total, vector, keyword) in enumerate(results_soft[:10], 1):
                title_short = title[:28] if title else "N/A"
                print(f"{rank:4d} {doc_id:8d} {total:8.3f} {vector:10.3f} {keyword:10.3f} {title_short:>30s}")
            
            print("\n硬约束模式(必须包含关键词):\n")
            results_hard = sk_search.search(query_text, query_vector, keywords, mode="hard")
            
            print(f"{'排名':>4s} {'ID':>8s} {'距离':>10s} {'标题':>40s}")
            print("-" * 65)
            
            for rank, (doc_id, title, distance) in enumerate(results_hard[:10], 1):
                title_short = title[:38] if title else "N/A"
                print(f"{rank:4d} {doc_id:8d} {distance:10.4f} {title_short:>40s}")
            ---
    b.多模态搜索
        a.功能说明
            多模态搜索结合文本、图像、音频等多种模态。每种模态使用对应的向量编码器。可以实现跨模态检索,如用文本搜索图像。适合电商、视频平台等多媒体场景。需要为不同模态创建不同的向量字段。融合策略需要考虑模态间的权重。可以提供更丰富的搜索体验。
        b.代码示例
            ---
            from pymilvus import Collection, CollectionSchema, FieldSchema, DataType
            import numpy as np
            
            # 创建多模态Collection
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
                FieldSchema(name="text_embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=512)
            ]
            
            schema = CollectionSchema(fields=fields, description="多模态搜索")
            multimodal_collection = Collection("multimodal_search", schema=schema)
            
            # 插入多模态数据
            data = [
                list(range(100)),  # ids
                [f"商品{i}" for i in range(100)],  # titles
                [[np.random.random() for _ in range(768)] for _ in range(100)],  # text embeddings
                [[np.random.random() for _ in range(512)] for _ in range(100)]   # image embeddings
            ]
            
            multimodal_collection.insert(data)
            multimodal_collection.flush()
            
            # 创建索引
            text_index = {
                "index_type": "IVF_FLAT",
                "metric_type": "COSINE",
                "params": {"nlist": 128}
            }
            
            image_index = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 128}
            }
            
            multimodal_collection.create_index("text_embedding", text_index)
            multimodal_collection.create_index("image_embedding", image_index)
            
            multimodal_collection.load()
            
            # 多模态搜索类
            class MultimodalSearch:
                def __init__(self, collection):
                    self.collection = collection
                
                def text_search(self, text_vector, limit=50):
                    """文本模态搜索"""
                    results = self.collection.search(
                        data=[text_vector],
                        anns_field="text_embedding",
                        param={"metric_type": "COSINE", "params": {"nprobe": 16}},
                        limit=limit,
                        output_fields=["id", "title"]
                    )
                    
                    return {hit.id: hit.distance for hit in results[0]}
                
                def image_search(self, image_vector, limit=50):
                    """图像模态搜索"""
                    results = self.collection.search(
                        data=[image_vector],
                        anns_field="image_embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=limit,
                        output_fields=["id", "title"]
                    )
                    
                    return {hit.id: hit.distance for hit in results[0]}
                
                def multimodal_search(self, text_vector=None, image_vector=None, weights=None):
                    """多模态融合搜索"""
                    if weights is None:
                        weights = {"text": 0.5, "image": 0.5}
                    
                    results_dict = {}
                    
                    # 文本搜索
                    if text_vector is not None:
                        text_results = self.text_search(text_vector)
                        
                        for doc_id, distance in text_results.items():
                            score = 1 / (1 + distance)  # 转换为相似度分数
                            results_dict[doc_id] = {"text": score}
                    
                    # 图像搜索
                    if image_vector is not None:
                        image_results = self.image_search(image_vector)
                        
                        # 归一化L2距离
                        max_dist = max(image_results.values()) if image_results else 1.0
                        
                        for doc_id, distance in image_results.items():
                            score = 1 - (distance / max_dist)
                            
                            if doc_id in results_dict:
                                results_dict[doc_id]["image"] = score
                            else:
                                results_dict[doc_id] = {"image": score}
                    
                    # 计算加权总分
                    final_scores = {}
                    for doc_id, scores in results_dict.items():
                        total = 0
                        for modality, weight in weights.items():
                            if modality in scores:
                                total += weight * scores[modality]
                        
                        final_scores[doc_id] = total
                    
                    # 排序
                    sorted_results = sorted(final_scores.items(), key=lambda x: x[1], reverse=True)
                    
                    return sorted_results[:20]
            
            # 使用多模态搜索
            mm_search = MultimodalSearch(multimodal_collection)
            
            text_query = [np.random.random() for _ in range(768)]
            image_query = [np.random.random() for _ in range(512)]
            
            print("多模态搜索结果:\n")
            
            # 纯文本搜索
            print("纯文本搜索:")
            text_only = mm_search.multimodal_search(text_vector=text_query, weights={"text": 1.0})
            for rank, (doc_id, score) in enumerate(text_only[:5], 1):
                print(f"  {rank}. 文档{doc_id}: {score:.3f}")
            
            # 纯图像搜索
            print("\n纯图像搜索:")
            image_only = mm_search.multimodal_search(image_vector=image_query, weights={"image": 1.0})
            for rank, (doc_id, score) in enumerate(image_only[:5], 1):
                print(f"  {rank}. 文档{doc_id}: {score:.3f}")
            
            # 多模态融合
            print("\n多模态融合搜索 (文本:图像 = 0.6:0.4):")
            multimodal_results = mm_search.multimodal_search(
                text_vector=text_query,
                image_vector=image_query,
                weights={"text": 0.6, "image": 0.4}
            )
            for rank, (doc_id, score) in enumerate(multimodal_results[:5], 1):
                print(f"  {rank}. 文档{doc_id}: {score:.3f}")
            
            print("\n多模态搜索应用:")
            print("  1. 电商: 图文结合商品搜索")
            print("  2. 视频: 文本搜索视频内容")
            print("  3. 社交: 跨模态内容推荐")
            print("  4. 教育: 多媒体资源检索")
            ---

8 性能优化

8.1 索引选择策略

01.索引类型对比
    a.FLAT索引
        a.功能说明
            FLAT索引是暴力搜索索引,不进行任何压缩或近似。提供100%召回率,结果最精确。适合小规模数据集(<100万向量)。查询速度随数据量线性增长。不需要训练,插入速度快。内存占用等于原始向量大小。适合对准确率要求极高的场景。数据量大时性能较差。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # FLAT索引配置
            flat_index = {
                "index_type": "FLAT",
                "metric_type": "L2",
                "params": {}  # FLAT索引无需参数
            }
            
            print("创建FLAT索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=flat_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试查询性能
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {}
            }
            
            start = time.time()
            for _ in range(100):
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            query_time = time.time() - start
            
            print(f"\n100次查询总时间: {query_time:.2f}s")
            print(f"平均查询延迟: {query_time/100*1000:.2f}ms")
            print(f"QPS: {100/query_time:.2f}")
            
            print("\nFLAT索引特点:")
            print("  优点: 100%召回率,最精确")
            print("  缺点: 查询速度慢,不适合大规模数据")
            print("  适用: <100万向量,高精度要求")
            ---
    b.IVF系列索引
        a.功能说明
            IVF系列索引使用倒排文件结构,将向量空间划分为多个聚类。包括IVF_FLAT、IVF_SQ8、IVF_PQ等变体。通过nlist参数控制聚类数量,nprobe控制搜索的聚类数。平衡了查询速度和召回率。适合中大规模数据集(100万-1000万向量)。需要训练阶段,构建时间较长。内存占用可通过量化降低。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # IVF_FLAT索引配置
            ivf_flat_index = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}  # 聚类数量
            }
            
            print("创建IVF_FLAT索引...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=ivf_flat_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试不同nprobe值的性能
            query_vector = [[np.random.random() for _ in range(128)]]
            
            nprobe_values = [1, 8, 16, 32, 64]
            
            print(f"\n{'nprobe':>8s} {'查询时间':>10s} {'QPS':>10s}")
            print("-" * 32)
            
            for nprobe in nprobe_values:
                search_params = {
                    "metric_type": "L2",
                    "params": {"nprobe": nprobe}
                }
                
                start = time.time()
                for _ in range(100):
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                query_time = time.time() - start
                
                qps = 100 / query_time
                print(f"{nprobe:8d} {query_time:9.2f}s {qps:9.2f}")
            
            print("\nIVF索引特点:")
            print("  优点: 速度快,内存可控")
            print("  缺点: 需要训练,召回率<100%")
            print("  适用: 100万-1000万向量")
            print("  调优: nlist=4*sqrt(n), nprobe=nlist的1-10%")
            ---

02.索引选择决策
    a.数据规模评估
        a.功能说明
            根据数据规模选择合适的索引类型。小规模(<10万)使用FLAT,中规模(10万-1000万)使用IVF系列,大规模(>1000万)使用HNSW或DiskANN。需要考虑数据增长趋势,预留性能空间。评估内存资源,选择合适的压缩方式。考虑查询QPS需求,平衡速度和精度。定期重新评估,根据业务变化调整索引。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            
            # 索引选择决策类
            class IndexSelector:
                def __init__(self, collection):
                    self.collection = collection
                
                def recommend_index(self, vector_count, qps_requirement, recall_requirement, memory_limit_gb):
                    """
                    推荐索引类型
                    
                    参数:
                        vector_count: 向量数量
                        qps_requirement: QPS需求
                        recall_requirement: 召回率要求 (0-1)
                        memory_limit_gb: 内存限制(GB)
                    """
                    recommendations = []
                    
                    # 计算向量维度和内存需求
                    # 假设128维float32向量,每个向量512字节
                    vector_size_bytes = 128 * 4
                    total_memory_gb = vector_count * vector_size_bytes / (1024**3)
                    
                    print(f"\n数据规模评估:")
                    print(f"  向量数量: {vector_count:,}")
                    print(f"  原始数据大小: {total_memory_gb:.2f} GB")
                    print(f"  QPS需求: {qps_requirement}")
                    print(f"  召回率要求: {recall_requirement*100:.0f}%")
                    print(f"  内存限制: {memory_limit_gb} GB")
                    
                    # 小规模数据
                    if vector_count < 100000:
                        if recall_requirement >= 0.99:
                            recommendations.append({
                                "index_type": "FLAT",
                                "reason": "小规模数据,高召回率要求",
                                "params": {},
                                "expected_recall": 1.0,
                                "expected_qps": "100-500",
                                "memory_gb": total_memory_gb
                            })
                        else:
                            recommendations.append({
                                "index_type": "IVF_FLAT",
                                "reason": "小规模数据,可接受近似搜索",
                                "params": {"nlist": 128},
                                "expected_recall": 0.95,
                                "expected_qps": "500-2000",
                                "memory_gb": total_memory_gb * 1.1
                            })
                    
                    # 中规模数据
                    elif vector_count < 10000000:
                        nlist = int(4 * np.sqrt(vector_count))
                        
                        if memory_limit_gb >= total_memory_gb:
                            recommendations.append({
                                "index_type": "IVF_FLAT",
                                "reason": "中规模数据,内存充足",
                                "params": {"nlist": nlist},
                                "expected_recall": 0.95,
                                "expected_qps": "1000-5000",
                                "memory_gb": total_memory_gb * 1.1
                            })
                        
                        if memory_limit_gb < total_memory_gb * 0.5:
                            recommendations.append({
                                "index_type": "IVF_SQ8",
                                "reason": "中规模数据,内存受限",
                                "params": {"nlist": nlist},
                                "expected_recall": 0.90,
                                "expected_qps": "2000-8000",
                                "memory_gb": total_memory_gb * 0.3
                            })
                        
                        if qps_requirement > 5000:
                            recommendations.append({
                                "index_type": "HNSW",
                                "reason": "高QPS需求",
                                "params": {"M": 16, "efConstruction": 200},
                                "expected_recall": 0.95,
                                "expected_qps": "5000-20000",
                                "memory_gb": total_memory_gb * 1.3
                            })
                    
                    # 大规模数据
                    else:
                        recommendations.append({
                            "index_type": "HNSW",
                            "reason": "大规模数据,高性能需求",
                            "params": {"M": 16, "efConstruction": 200},
                            "expected_recall": 0.95,
                            "expected_qps": "5000-20000",
                            "memory_gb": total_memory_gb * 1.3
                        })
                        
                        if memory_limit_gb < total_memory_gb:
                            recommendations.append({
                                "index_type": "IVF_PQ",
                                "reason": "大规模数据,内存受限",
                                "params": {"nlist": 4096, "m": 16},
                                "expected_recall": 0.85,
                                "expected_qps": "3000-10000",
                                "memory_gb": total_memory_gb * 0.1
                            })
                    
                    return recommendations
                
                def print_recommendations(self, recommendations):
                    """打印推荐结果"""
                    print(f"\n索引推荐 (共{len(recommendations)}个选项):\n")
                    
                    for i, rec in enumerate(recommendations, 1):
                        print(f"{i}. {rec['index_type']}")
                        print(f"   原因: {rec['reason']}")
                        print(f"   参数: {rec['params']}")
                        print(f"   预期召回率: {rec['expected_recall']*100:.0f}%")
                        print(f"   预期QPS: {rec['expected_qps']}")
                        print(f"   内存需求: {rec['memory_gb']:.2f} GB")
                        print()
            
            # 使用索引选择器
            collection = Collection("documents")
            selector = IndexSelector(collection)
            
            # 场景1: 小规模高精度
            print("="*60)
            print("场景1: 小规模高精度")
            print("="*60)
            recs = selector.recommend_index(
                vector_count=50000,
                qps_requirement=200,
                recall_requirement=0.99,
                memory_limit_gb=10
            )
            selector.print_recommendations(recs)
            
            # 场景2: 中规模平衡
            print("="*60)
            print("场景2: 中规模平衡")
            print("="*60)
            recs = selector.recommend_index(
                vector_count=5000000,
                qps_requirement=3000,
                recall_requirement=0.95,
                memory_limit_gb=50
            )
            selector.print_recommendations(recs)
            
            # 场景3: 大规模内存受限
            print("="*60)
            print("场景3: 大规模内存受限")
            print("="*60)
            recs = selector.recommend_index(
                vector_count=50000000,
                qps_requirement=5000,
                recall_requirement=0.90,
                memory_limit_gb=20
            )
            selector.print_recommendations(recs)
            ---
    b.性能测试对比
        a.功能说明
            通过性能测试对比不同索引的实际表现。测试指标包括构建时间、查询延迟、QPS、召回率、内存占用等。使用真实数据和查询模式进行测试。对比不同参数配置的影响。测试结果指导索引选择和参数调优。定期进行性能回归测试。建立性能基准,监控性能变化。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 性能测试类
            class IndexBenchmark:
                def __init__(self, collection):
                    self.collection = collection
                    self.results = []
                
                def benchmark_index(self, index_config, search_params, num_queries=100):
                    """测试单个索引配置"""
                    index_type = index_config["index_type"]
                    
                    print(f"\n测试索引: {index_type}")
                    print(f"  参数: {index_config.get('params', {})}")
                    
                    # 删除现有索引
                    try:
                        self.collection.release()
                        self.collection.drop_index()
                    except:
                        pass
                    
                    # 构建索引
                    print("  构建索引...")
                    start = time.time()
                    self.collection.create_index(field_name="embedding", index_params=index_config)
                    build_time = time.time() - start
                    
                    # 加载Collection
                    self.collection.load()
                    
                    # 查询测试
                    query_vectors = [[np.random.random() for _ in range(128)] for _ in range(num_queries)]
                    
                    latencies = []
                    for query_vector in query_vectors:
                        start = time.time()
                        results = self.collection.search(
                            data=[query_vector],
                            anns_field="embedding",
                            param=search_params,
                            limit=10
                        )
                        latency = time.time() - start
                        latencies.append(latency)
                    
                    # 统计结果
                    avg_latency = np.mean(latencies) * 1000  # ms
                    p95_latency = np.percentile(latencies, 95) * 1000
                    p99_latency = np.percentile(latencies, 99) * 1000
                    qps = 1 / np.mean(latencies)
                    
                    # 内存占用(简化)
                    memory_usage = "N/A"
                    
                    result = {
                        "index_type": index_type,
                        "params": index_config.get("params", {}),
                        "build_time": build_time,
                        "avg_latency": avg_latency,
                        "p95_latency": p95_latency,
                        "p99_latency": p99_latency,
                        "qps": qps,
                        "memory": memory_usage
                    }
                    
                    self.results.append(result)
                    
                    print(f"  构建时间: {build_time:.2f}s")
                    print(f"  平均延迟: {avg_latency:.2f}ms")
                    print(f"  P95延迟: {p95_latency:.2f}ms")
                    print(f"  P99延迟: {p99_latency:.2f}ms")
                    print(f"  QPS: {qps:.2f}")
                    
                    return result
                
                def compare_indexes(self, index_configs, search_params_list, num_queries=100):
                    """对比多个索引配置"""
                    print("="*80)
                    print("索引性能对比测试")
                    print("="*80)
                    
                    for index_config, search_params in zip(index_configs, search_params_list):
                        self.benchmark_index(index_config, search_params, num_queries)
                    
                    # 打印对比表格
                    print(f"\n{'索引类型':>15s} {'构建时间':>10s} {'平均延迟':>10s} {'P95延迟':>10s} {'QPS':>10s}")
                    print("-" * 60)
                    
                    for result in self.results:
                        print(f"{result['index_type']:>15s} {result['build_time']:9.2f}s {result['avg_latency']:9.2f}ms {result['p95_latency']:9.2f}ms {result['qps']:9.2f}")
                    
                    # 推荐最佳配置
                    best_qps = max(self.results, key=lambda x: x["qps"])
                    best_latency = min(self.results, key=lambda x: x["avg_latency"])
                    
                    print(f"\n推荐:")
                    print(f"  最高QPS: {best_qps['index_type']} ({best_qps['qps']:.2f})")
                    print(f"  最低延迟: {best_latency['index_type']} ({best_latency['avg_latency']:.2f}ms)")
            
            # 使用性能测试
            benchmark = IndexBenchmark(collection)
            
            # 定义测试配置
            index_configs = [
                {
                    "index_type": "FLAT",
                    "metric_type": "L2",
                    "params": {}
                },
                {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 128}
                },
                {
                    "index_type": "IVF_FLAT",
                    "metric_type": "L2",
                    "params": {"nlist": 512}
                },
                {
                    "index_type": "HNSW",
                    "metric_type": "L2",
                    "params": {"M": 16, "efConstruction": 200}
                }
            ]
            
            search_params_list = [
                {"metric_type": "L2", "params": {}},
                {"metric_type": "L2", "params": {"nprobe": 16}},
                {"metric_type": "L2", "params": {"nprobe": 64}},
                {"metric_type": "L2", "params": {"ef": 64}}
            ]
            
            # 执行对比测试
            benchmark.compare_indexes(index_configs, search_params_list, num_queries=50)
            ---

8.2 查询参数调优

01.搜索参数优化
    a.nprobe参数调优
        a.功能说明
            nprobe控制IVF索引搜索的聚类数量,直接影响召回率和查询速度。nprobe越大,召回率越高,但查询速度越慢。推荐值为nlist的1-10%。需要根据业务场景平衡精度和性能。可以通过A/B测试确定最优值。不同查询可以使用不同的nprobe值。高优先级查询使用更大的nprobe。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF索引
            index_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # nprobe调优类
            class NprobeOptimizer:
                def __init__(self, collection):
                    self.collection = collection
                
                def test_nprobe_values(self, query_vector, nprobe_values, num_queries=100):
                    """测试不同nprobe值的性能"""
                    results = []
                    
                    print(f"\nnprobe参数调优测试 ({num_queries}次查询):\n")
                    print(f"{'nprobe':>8s} {'平均延迟':>12s} {'P95延迟':>12s} {'QPS':>10s} {'召回率估计':>12s}")
                    print("-" * 60)
                    
                    # 获取基准结果(使用最大nprobe)
                    baseline_search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": max(nprobe_values)}
                    }
                    
                    baseline_results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=baseline_search_params,
                        limit=10
                    )
                    baseline_ids = set(hit.id for hit in baseline_results[0])
                    
                    for nprobe in nprobe_values:
                        search_params = {
                            "metric_type": "L2",
                            "params": {"nprobe": nprobe}
                        }
                        
                        latencies = []
                        recall_sum = 0
                        
                        for _ in range(num_queries):
                            start = time.time()
                            search_results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                            latency = time.time() - start
                            latencies.append(latency)
                            
                            # 计算召回率
                            result_ids = set(hit.id for hit in search_results[0])
                            recall = len(result_ids & baseline_ids) / len(baseline_ids)
                            recall_sum += recall
                        
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        qps = 1 / np.mean(latencies)
                        avg_recall = recall_sum / num_queries
                        
                        results.append({
                            "nprobe": nprobe,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency,
                            "qps": qps,
                            "recall": avg_recall
                        })
                        
                        print(f"{nprobe:8d} {avg_latency:11.2f}ms {p95_latency:11.2f}ms {qps:9.2f} {avg_recall*100:11.1f}%")
                    
                    return results
                
                def recommend_nprobe(self, results, min_recall=0.95):
                    """推荐最优nprobe值"""
                    # 找到满足召回率要求的最小nprobe
                    valid_results = [r for r in results if r["recall"] >= min_recall]
                    
                    if not valid_results:
                        print(f"\n警告: 没有配置满足{min_recall*100:.0f}%召回率要求")
                        return None
                    
                    best = min(valid_results, key=lambda x: x["avg_latency"])
                    
                    print(f"\n推荐配置 (召回率≥{min_recall*100:.0f}%):")
                    print(f"  nprobe: {best['nprobe']}")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  QPS: {best['qps']:.2f}")
                    print(f"  召回率: {best['recall']*100:.1f}%")
                    
                    return best
                
                def adaptive_nprobe(self, query_priority):
                    """根据查询优先级自适应选择nprobe"""
                    # 高优先级: nprobe更大,召回率更高
                    # 低优先级: nprobe更小,速度更快
                    
                    nprobe_map = {
                        "high": 64,      # 高优先级
                        "medium": 32,    # 中优先级
                        "low": 16        # 低优先级
                    }
                    
                    return nprobe_map.get(query_priority, 32)
            
            # 使用nprobe优化器
            optimizer = NprobeOptimizer(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            nprobe_values = [8, 16, 32, 64, 128, 256]
            
            # 测试不同nprobe值
            results = optimizer.test_nprobe_values(query_vector, nprobe_values, num_queries=50)
            
            # 推荐最优配置
            best_config = optimizer.recommend_nprobe(results, min_recall=0.95)
            
            # 自适应nprobe示例
            print("\n自适应nprobe策略:")
            for priority in ["high", "medium", "low"]:
                nprobe = optimizer.adaptive_nprobe(priority)
                print(f"  {priority}优先级: nprobe={nprobe}")
            ---
    b.ef参数调优
        a.功能说明
            ef参数用于HNSW索引,控制搜索时的候选集大小。ef越大,召回率越高,但查询速度越慢。ef必须大于等于limit(返回结果数)。推荐值为limit的2-10倍。efConstruction是构建时参数,ef是查询时参数。可以动态调整ef值,无需重建索引。需要根据召回率要求选择合适的ef值。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建HNSW索引
            index_params = {
                "index_type": "HNSW",
                "metric_type": "L2",
                "params": {
                    "M": 16,
                    "efConstruction": 200
                }
            }
            
            collection.create_index(field_name="embedding", index_params=index_params)
            collection.load()
            
            # ef参数调优类
            class EfOptimizer:
                def __init__(self, collection):
                    self.collection = collection
                
                def test_ef_values(self, query_vector, ef_values, limit=10, num_queries=100):
                    """测试不同ef值的性能"""
                    results = []
                    
                    print(f"\nef参数调优测试 (limit={limit}, {num_queries}次查询):\n")
                    print(f"{'ef':>6s} {'平均延迟':>12s} {'P95延迟':>12s} {'QPS':>10s} {'召回率估计':>12s}")
                    print("-" * 58)
                    
                    # 获取基准结果
                    baseline_search_params = {
                        "metric_type": "L2",
                        "params": {"ef": max(ef_values)}
                    }
                    
                    baseline_results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=baseline_search_params,
                        limit=limit
                    )
                    baseline_ids = set(hit.id for hit in baseline_results[0])
                    
                    for ef in ef_values:
                        if ef < limit:
                            print(f"{ef:6d} 跳过 (ef必须≥limit={limit})")
                            continue
                        
                        search_params = {
                            "metric_type": "L2",
                            "params": {"ef": ef}
                        }
                        
                        latencies = []
                        recall_sum = 0
                        
                        for _ in range(num_queries):
                            start = time.time()
                            search_results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=limit
                            )
                            latency = time.time() - start
                            latencies.append(latency)
                            
                            result_ids = set(hit.id for hit in search_results[0])
                            recall = len(result_ids & baseline_ids) / len(baseline_ids)
                            recall_sum += recall
                        
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        qps = 1 / np.mean(latencies)
                        avg_recall = recall_sum / num_queries
                        
                        results.append({
                            "ef": ef,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency,
                            "qps": qps,
                            "recall": avg_recall
                        })
                        
                        print(f"{ef:6d} {avg_latency:11.2f}ms {p95_latency:11.2f}ms {qps:9.2f} {avg_recall*100:11.1f}%")
                    
                    return results
                
                def recommend_ef(self, results, limit, min_recall=0.95):
                    """推荐最优ef值"""
                    valid_results = [r for r in results if r["recall"] >= min_recall]
                    
                    if not valid_results:
                        print(f"\n警告: 没有配置满足{min_recall*100:.0f}%召回率要求")
                        return None
                    
                    best = min(valid_results, key=lambda x: x["avg_latency"])
                    
                    print(f"\n推荐配置 (limit={limit}, 召回率≥{min_recall*100:.0f}%):")
                    print(f"  ef: {best['ef']} (约{best['ef']/limit:.1f}倍limit)")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  QPS: {best['qps']:.2f}")
                    print(f"  召回率: {best['recall']*100:.1f}%")
                    
                    return best
            
            # 使用ef优化器
            ef_optimizer = EfOptimizer(collection)
            
            query_vector = [np.random.random() for _ in range(128)]
            
            # 测试不同limit下的ef值
            for limit in [10, 50, 100]:
                ef_values = [limit, limit*2, limit*4, limit*8, limit*10]
                
                print(f"\n{'='*60}")
                print(f"测试limit={limit}")
                print(f"{'='*60}")
                
                results = ef_optimizer.test_ef_values(query_vector, ef_values, limit=limit, num_queries=50)
                ef_optimizer.recommend_ef(results, limit=limit, min_recall=0.95)
            
            print("\nef参数调优建议:")
            print("  1. ef ≥ limit (必须)")
            print("  2. ef = limit * 2-4 (平衡)")
            print("  3. ef = limit * 8-10 (高召回)")
            print("  4. 根据召回率要求调整")
            print("  5. 可以动态调整,无需重建索引")
            ---

02.批量查询优化
    a.批量大小调整
        a.功能说明
            批量查询可以提升吞吐量,减少网络开销。批量大小影响延迟和吞吐量的平衡。批量过大会增加单次查询延迟。批量过小无法充分利用并行能力。推荐批量大小为10-100。需要根据硬件资源和业务需求调整。高吞吐场景使用更大批量。低延迟场景使用更小批量。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            collection.load()
            
            # 批量查询优化类
            class BatchQueryOptimizer:
                def __init__(self, collection):
                    self.collection = collection
                
                def test_batch_sizes(self, batch_sizes, total_queries=1000):
                    """测试不同批量大小的性能"""
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    print(f"\n批量大小优化测试 (总查询数={total_queries}):\n")
                    print(f"{'批量大小':>10s} {'总时间':>10s} {'吞吐量':>12s} {'平均延迟':>12s} {'P95延迟':>12s}")
                    print("-" * 62)
                    
                    results = []
                    
                    for batch_size in batch_sizes:
                        num_batches = total_queries // batch_size
                        
                        total_time = 0
                        latencies = []
                        
                        for _ in range(num_batches):
                            # 生成批量查询向量
                            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(batch_size)]
                            
                            start = time.time()
                            results_batch = self.collection.search(
                                data=query_vectors,
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                            elapsed = time.time() - start
                            
                            total_time += elapsed
                            latencies.append(elapsed)
                        
                        throughput = total_queries / total_time
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        
                        results.append({
                            "batch_size": batch_size,
                            "total_time": total_time,
                            "throughput": throughput,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency
                        })
                        
                        print(f"{batch_size:10d} {total_time:9.2f}s {throughput:11.2f}qps {avg_latency:11.2f}ms {p95_latency:11.2f}ms")
                    
                    return results
                
                def recommend_batch_size(self, results, max_latency_ms=None):
                    """推荐最优批量大小"""
                    if max_latency_ms:
                        # 满足延迟要求的最大吞吐量
                        valid_results = [r for r in results if r["avg_latency"] <= max_latency_ms]
                        
                        if not valid_results:
                            print(f"\n警告: 没有配置满足{max_latency_ms}ms延迟要求")
                            return None
                        
                        best = max(valid_results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (延迟≤{max_latency_ms}ms):")
                    else:
                        # 最大吞吐量
                        best = max(results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (最大吞吐量):")
                    
                    print(f"  批量大小: {best['batch_size']}")
                    print(f"  吞吐量: {best['throughput']:.2f} qps")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  P95延迟: {best['p95_latency']:.2f}ms")
                    
                    return best
            
            # 使用批量查询优化器
            batch_optimizer = BatchQueryOptimizer(collection)
            
            batch_sizes = [1, 10, 20, 50, 100, 200]
            
            # 测试不同批量大小
            results = batch_optimizer.test_batch_sizes(batch_sizes, total_queries=1000)
            
            # 推荐最大吞吐量配置
            batch_optimizer.recommend_batch_size(results)
            
            # 推荐满足延迟要求的配置
            batch_optimizer.recommend_batch_size(results, max_latency_ms=50)
            
            print("\n批量查询优化建议:")
            print("  1. 高吞吐场景: 批量50-200")
            print("  2. 低延迟场景: 批量1-20")
            print("  3. 平衡场景: 批量20-50")
            print("  4. 监控延迟和吞吐量指标")
            print("  5. 根据硬件资源动态调整")
            ---
    b.并发控制
        a.功能说明
            并发查询可以提升系统吞吐量,充分利用资源。并发数影响延迟和资源使用。并发过高会导致资源竞争和延迟增加。并发过低无法充分利用硬件能力。推荐并发数为CPU核心数的2-4倍。需要监控系统负载,避免过载。可以使用连接池管理并发连接。实现请求限流和熔断机制。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            from concurrent.futures import ThreadPoolExecutor, as_completed
            
            collection = Collection("documents")
            collection.load()
            
            # 并发控制类
            class ConcurrencyController:
                def __init__(self, collection):
                    self.collection = collection
                
                def single_query(self, query_id):
                    """单个查询任务"""
                    query_vector = [[np.random.random() for _ in range(128)]]
                    search_params = {
                        "metric_type": "L2",
                        "params": {"nprobe": 16}
                    }
                    
                    start = time.time()
                    results = self.collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param=search_params,
                        limit=10
                    )
                    latency = time.time() - start
                    
                    return query_id, latency
                
                def test_concurrency(self, concurrency_levels, num_queries=1000):
                    """测试不同并发级别的性能"""
                    print(f"\n并发控制测试 (总查询数={num_queries}):\n")
                    print(f"{'并发数':>8s} {'总时间':>10s} {'吞吐量':>12s} {'平均延迟':>12s} {'P95延迟':>12s}")
                    print("-" * 60)
                    
                    results = []
                    
                    for concurrency in concurrency_levels:
                        latencies = []
                        
                        start = time.time()
                        
                        with ThreadPoolExecutor(max_workers=concurrency) as executor:
                            futures = [executor.submit(self.single_query, i) for i in range(num_queries)]
                            
                            for future in as_completed(futures):
                                query_id, latency = future.result()
                                latencies.append(latency)
                        
                        total_time = time.time() - start
                        throughput = num_queries / total_time
                        avg_latency = np.mean(latencies) * 1000
                        p95_latency = np.percentile(latencies, 95) * 1000
                        
                        results.append({
                            "concurrency": concurrency,
                            "total_time": total_time,
                            "throughput": throughput,
                            "avg_latency": avg_latency,
                            "p95_latency": p95_latency
                        })
                        
                        print(f"{concurrency:8d} {total_time:9.2f}s {throughput:11.2f}qps {avg_latency:11.2f}ms {p95_latency:11.2f}ms")
                    
                    return results
                
                def recommend_concurrency(self, results, max_latency_ms=None):
                    """推荐最优并发数"""
                    if max_latency_ms:
                        valid_results = [r for r in results if r["p95_latency"] <= max_latency_ms]
                        
                        if not valid_results:
                            print(f"\n警告: 没有配置满足P95延迟≤{max_latency_ms}ms要求")
                            return None
                        
                        best = max(valid_results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (P95延迟≤{max_latency_ms}ms):")
                    else:
                        best = max(results, key=lambda x: x["throughput"])
                        
                        print(f"\n推荐配置 (最大吞吐量):")
                    
                    print(f"  并发数: {best['concurrency']}")
                    print(f"  吞吐量: {best['throughput']:.2f} qps")
                    print(f"  平均延迟: {best['avg_latency']:.2f}ms")
                    print(f"  P95延迟: {best['p95_latency']:.2f}ms")
                    
                    return best
            
            # 使用并发控制器
            concurrency_controller = ConcurrencyController(collection)
            
            concurrency_levels = [1, 2, 4, 8, 16, 32, 64]
            
            # 测试不同并发级别
            results = concurrency_controller.test_concurrency(concurrency_levels, num_queries=500)
            
            # 推荐最大吞吐量配置
            concurrency_controller.recommend_concurrency(results)
            
            # 推荐满足延迟要求的配置
            concurrency_controller.recommend_concurrency(results, max_latency_ms=100)
            
            print("\n并发控制建议:")
            print("  1. 并发数 = CPU核心数 * 2-4")
            print("  2. 监控CPU和内存使用率")
            print("  3. 避免过度并发导致资源竞争")
            print("  4. 实现请求限流机制")
            print("  5. 使用连接池管理连接")
            ---

8.3 内存优化

01.内存使用分析
    a.内存占用评估
        a.功能说明
            内存是Milvus性能的关键资源,需要合理评估和管理。内存主要用于存储向量数据、索引结构、查询缓存等。不同索引类型内存占用差异很大。FLAT索引内存占用最大,PQ索引内存占用最小。需要监控内存使用情况,避免OOM。可以通过量化、压缩等技术降低内存占用。合理配置内存限制和缓存策略。
        b.代码示例
            ---
            from pymilvus import Collection, utility
            import numpy as np
            
            collection = Collection("documents")
            
            # 内存分析类
            class MemoryAnalyzer:
                def __init__(self, collection):
                    self.collection = collection
                
                def estimate_memory_usage(self, vector_count, vector_dim, index_type):
                    """估算内存使用"""
                    # 单个向量大小(float32)
                    vector_size_bytes = vector_dim * 4
                    
                    # 原始数据大小
                    raw_data_mb = vector_count * vector_size_bytes / (1024**2)
                    
                    # 索引开销系数
                    index_overhead = {
                        "FLAT": 1.0,        # 无额外开销
                        "IVF_FLAT": 1.1,    # 10%开销
                        "IVF_SQ8": 0.3,     # 压缩到30%
                        "IVF_PQ": 0.1,      # 压缩到10%
                        "HNSW": 1.3,        # 30%开销
                    }
                    
                    overhead_factor = index_overhead.get(index_type, 1.0)
                    total_memory_mb = raw_data_mb * overhead_factor
                    
                    return {
                        "vector_count": vector_count,
                        "vector_dim": vector_dim,
                        "index_type": index_type,
                        "raw_data_mb": raw_data_mb,
                        "overhead_factor": overhead_factor,
                        "total_memory_mb": total_memory_mb,
                        "total_memory_gb": total_memory_mb / 1024
                    }
                
                def print_memory_report(self, estimates):
                    """打印内存报告"""
                    print("\n内存使用估算:")
                    print(f"  向量数量: {estimates['vector_count']:,}")
                    print(f"  向量维度: {estimates['vector_dim']}")
                    print(f"  索引类型: {estimates['index_type']}")
                    print(f"  原始数据: {estimates['raw_data_mb']:.2f} MB ({estimates['raw_data_mb']/1024:.2f} GB)")
                    print(f"  开销系数: {estimates['overhead_factor']:.1f}x")
                    print(f"  总内存: {estimates['total_memory_mb']:.2f} MB ({estimates['total_memory_gb']:.2f} GB)")
                
                def compare_index_memory(self, vector_count, vector_dim):
                    """对比不同索引的内存占用"""
                    index_types = ["FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW"]
                    
                    print(f"\n索引内存对比 ({vector_count:,}个{vector_dim}维向量):\n")
                    print(f"{'索引类型':>12s} {'原始数据':>12s} {'总内存':>12s} {'压缩率':>10s}")
                    print("-" * 50)
                    
                    for index_type in index_types:
                        est = self.estimate_memory_usage(vector_count, vector_dim, index_type)
                        compression = est['total_memory_mb'] / est['raw_data_mb']
                        
                        print(f"{index_type:>12s} {est['raw_data_mb']:11.2f}MB {est['total_memory_mb']:11.2f}MB {compression:9.1f}x")
            
            # 使用内存分析器
            analyzer = MemoryAnalyzer(collection)
            
            # 估算不同规模的内存需求
            scenarios = [
                (100000, 128, "IVF_FLAT"),
                (1000000, 128, "IVF_FLAT"),
                (10000000, 128, "IVF_SQ8"),
                (100000000, 128, "IVF_PQ")
            ]
            
            for vector_count, vector_dim, index_type in scenarios:
                estimates = analyzer.estimate_memory_usage(vector_count, vector_dim, index_type)
                analyzer.print_memory_report(estimates)
            
            # 对比索引内存
            analyzer.compare_index_memory(10000000, 128)
            
            print("\n内存优化建议:")
            print("  1. 使用量化索引(SQ8/PQ)降低内存")
            print("  2. 分区管理,按需加载")
            print("  3. 监控内存使用,设置限制")
            print("  4. 定期释放不用的分区")
            print("  5. 使用DiskANN处理超大规模数据")
            ---
    b.内存限制配置
        a.功能说明
            配置内存限制可以避免OOM,保证系统稳定性。可以为QueryNode设置内存上限。超过限制时拒绝加载新数据或查询。需要合理设置限制,避免过于严格影响性能。监控内存使用率,及时调整配置。可以配置内存预留,避免突发流量。实现内存告警机制,提前发现问题。
        b.代码示例
            ---
            # 内存限制配置(通过配置文件)
            memory_config = """
            queryNode:
              cache:
                memoryLimit: 2147483648  # 2GB内存限制
                enabled: true
              
              loadMemoryUsageMaxLevel: 90  # 内存使用率超过90%时停止加载
              
              gracefulStopTimeout: 30  # 优雅停机超时时间
            
            # 监控配置
            monitoring:
              memory:
                warningThreshold: 0.8  # 80%告警
                criticalThreshold: 0.9  # 90%严重告警
            """
            
            print("内存限制配置示例:")
            print(memory_config)
            
            print("\n内存限制策略:")
            print("  1. 设置QueryNode内存上限")
            print("  2. 配置内存使用率阈值")
            print("  3. 实现内存告警机制")
            print("  4. 优雅降级,拒绝新请求")
            print("  5. 定期清理缓存和临时数据")
            ---

02.量化压缩
    a.标量量化SQ8
        a.功能说明
            标量量化将float32向量压缩为int8,内存降低75%。SQ8使用线性量化,精度损失较小。适合内存受限但对精度要求不高的场景。查询速度略快于FLAT,因为数据量更小。召回率略低于FLAT,通常在95%以上。需要在构建索引时指定。不可逆压缩,无法恢复原始数据。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF_SQ8索引
            sq8_index = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            print("创建IVF_SQ8索引(标量量化)...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=sq8_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试查询性能
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            for _ in range(100):
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            query_time = time.time() - start
            
            print(f"\n100次查询总时间: {query_time:.2f}s")
            print(f"平均查询延迟: {query_time/100*1000:.2f}ms")
            print(f"QPS: {100/query_time:.2f}")
            
            print("\nSQ8量化特点:")
            print("  压缩率: 4x (float32 -> int8)")
            print("  内存节省: 75%")
            print("  召回率: ~95%")
            print("  查询速度: 略快于FLAT")
            print("  适用: 内存受限,可接受小幅精度损失")
            ---
    b.乘积量化PQ
        a.功能说明
            乘积量化将向量分段量化,压缩率更高。可以将内存降低到原来的10%甚至更低。通过m参数控制分段数,影响压缩率和精度。适合超大规模数据,内存严重受限的场景。召回率低于SQ8,通常在85-90%。查询速度较快,但精度损失较大。需要权衡内存和精度。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            
            collection = Collection("documents")
            
            # 创建IVF_PQ索引
            pq_index = {
                "index_type": "IVF_PQ",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024,
                    "m": 16,        # 分段数,必须能整除向量维度
                    "nbits": 8      # 每段的比特数
                }
            }
            
            print("创建IVF_PQ索引(乘积量化)...")
            start = time.time()
            collection.create_index(field_name="embedding", index_params=pq_index)
            build_time = time.time() - start
            
            print(f"索引构建时间: {build_time:.2f}s")
            
            collection.load()
            
            # 测试查询性能
            query_vector = [[np.random.random() for _ in range(128)]]
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            start = time.time()
            for _ in range(100):
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param=search_params,
                    limit=10
                )
            query_time = time.time() - start
            
            print(f"\n100次查询总时间: {query_time:.2f}s")
            print(f"平均查询延迟: {query_time/100*1000:.2f}ms")
            print(f"QPS: {100/query_time:.2f}")
            
            print("\nPQ量化特点:")
            print("  压缩率: 10-40x (取决于m和nbits)")
            print("  内存节省: 90-97%")
            print("  召回率: ~85-90%")
            print("  查询速度: 较快")
            print("  适用: 超大规模数据,内存严重受限")
            print("  参数: m必须能整除向量维度")
            ---

8.4 并发控制

01.连接池管理
    a.连接池配置
        a.功能说明
            连接池复用连接,减少连接建立开销。配置合适的连接池大小可以提升并发性能。连接池过小会导致连接等待,过大会浪费资源。推荐连接池大小为并发数的1-2倍。需要配置连接超时和空闲超时。实现连接健康检查,自动重连。监控连接池使用情况,动态调整。
        b.代码示例
            ---
            from pymilvus import connections, Collection
            import threading
            import time
            
            # 连接池配置
            class ConnectionPool:
                def __init__(self, alias_prefix="conn", pool_size=10):
                    self.alias_prefix = alias_prefix
                    self.pool_size = pool_size
                    self.connections = []
                    self.lock = threading.Lock()
                    self.init_pool()
                
                def init_pool(self):
                    """初始化连接池"""
                    print(f"初始化连接池,大小: {self.pool_size}")
                    
                    for i in range(self.pool_size):
                        alias = f"{self.alias_prefix}_{i}"
                        
                        connections.connect(
                            alias=alias,
                            host="localhost",
                            port="19530",
                            timeout=30
                        )
                        
                        self.connections.append({
                            "alias": alias,
                            "in_use": False,
                            "last_used": time.time()
                        })
                    
                    print(f"连接池初始化完成")
                
                def acquire(self, timeout=10):
                    """获取连接"""
                    start = time.time()
                    
                    while time.time() - start < timeout:
                        with self.lock:
                            for conn in self.connections:
                                if not conn["in_use"]:
                                    conn["in_use"] = True
                                    conn["last_used"] = time.time()
                                    return conn["alias"]
                        
                        time.sleep(0.01)
                    
                    raise TimeoutError("获取连接超时")
                
                def release(self, alias):
                    """释放连接"""
                    with self.lock:
                        for conn in self.connections:
                            if conn["alias"] == alias:
                                conn["in_use"] = False
                                conn["last_used"] = time.time()
                                break
                
                def get_stats(self):
                    """获取连接池统计"""
                    with self.lock:
                        total = len(self.connections)
                        in_use = sum(1 for conn in self.connections if conn["in_use"])
                        available = total - in_use
                        
                        return {
                            "total": total,
                            "in_use": in_use,
                            "available": available,
                            "usage_rate": in_use / total if total > 0 else 0
                        }
                
                def close_all(self):
                    """关闭所有连接"""
                    print("关闭连接池...")
                    
                    for conn in self.connections:
                        try:
                            connections.disconnect(conn["alias"])
                        except:
                            pass
                    
                    self.connections.clear()
                    print("连接池已关闭")
            
            # 使用连接池
            pool = ConnectionPool(pool_size=5)
            
            def worker_task(task_id, pool):
                """工作线程任务"""
                try:
                    # 获取连接
                    alias = pool.acquire(timeout=5)
                    print(f"任务{task_id}: 获取连接 {alias}")
                    
                    # 使用连接执行查询
                    collection = Collection("documents", using=alias)
                    
                    # 模拟查询
                    time.sleep(0.1)
                    
                    print(f"任务{task_id}: 完成查询")
                    
                    # 释放连接
                    pool.release(alias)
                    print(f"任务{task_id}: 释放连接 {alias}")
                
                except Exception as e:
                    print(f"任务{task_id}: 失败 - {e}")
            
            # 创建多个工作线程
            threads = []
            for i in range(10):
                thread = threading.Thread(target=worker_task, args=(i, pool))
                threads.append(thread)
                thread.start()
            
            # 等待所有线程完成
            for thread in threads:
                thread.join()
            
            # 打印连接池统计
            stats = pool.get_stats()
            print(f"\n连接池统计:")
            print(f"  总连接数: {stats['total']}")
            print(f"  使用中: {stats['in_use']}")
            print(f"  可用: {stats['available']}")
            print(f"  使用率: {stats['usage_rate']*100:.1f}%")
            
            # 关闭连接池
            pool.close_all()
            
            print("\n连接池配置建议:")
            print("  1. 连接池大小 = 并发数 * 1-2")
            print("  2. 配置连接超时和空闲超时")
            print("  3. 实现连接健康检查")
            print("  4. 监控连接池使用率")
            print("  5. 动态调整连接池大小")
            ---
    b.请求限流
        a.功能说明
            请求限流保护系统不被过载,保证服务稳定性。可以限制QPS、并发数、请求大小等。常见限流算法包括令牌桶、漏桶、固定窗口等。需要根据系统容量设置限流阈值。超过限流时返回错误或排队等待。可以为不同用户设置不同限流策略。实现优雅降级,保证核心功能可用。
        b.代码示例
            ---
            import time
            import threading
            from collections import deque
            
            # 令牌桶限流器
            class TokenBucketLimiter:
                def __init__(self, rate, capacity):
                    """
                    rate: 每秒生成的令牌数
                    capacity: 桶容量
                    """
                    self.rate = rate
                    self.capacity = capacity
                    self.tokens = capacity
                    self.last_update = time.time()
                    self.lock = threading.Lock()
                
                def acquire(self, tokens=1):
                    """获取令牌"""
                    with self.lock:
                        now = time.time()
                        
                        # 补充令牌
                        elapsed = now - self.last_update
                        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
                        self.last_update = now
                        
                        # 尝试获取令牌
                        if self.tokens >= tokens:
                            self.tokens -= tokens
                            return True
                        else:
                            return False
                
                def wait_acquire(self, tokens=1, timeout=10):
                    """等待获取令牌"""
                    start = time.time()
                    
                    while time.time() - start < timeout:
                        if self.acquire(tokens):
                            return True
                        time.sleep(0.01)
                    
                    return False
            
            # 滑动窗口限流器
            class SlidingWindowLimiter:
                def __init__(self, max_requests, window_seconds):
                    """
                    max_requests: 窗口内最大请求数
                    window_seconds: 窗口大小(秒)
                    """
                    self.max_requests = max_requests
                    self.window_seconds = window_seconds
                    self.requests = deque()
                    self.lock = threading.Lock()
                
                def acquire(self):
                    """尝试获取许可"""
                    with self.lock:
                        now = time.time()
                        
                        # 移除过期请求
                        while self.requests and self.requests[0] < now - self.window_seconds:
                            self.requests.popleft()
                        
                        # 检查是否超过限制
                        if len(self.requests) < self.max_requests:
                            self.requests.append(now)
                            return True
                        else:
                            return False
                
                def get_current_rate(self):
                    """获取当前请求率"""
                    with self.lock:
                        now = time.time()
                        
                        # 移除过期请求
                        while self.requests and self.requests[0] < now - self.window_seconds:
                            self.requests.popleft()
                        
                        return len(self.requests) / self.window_seconds
            
            # 并发限流器
            class ConcurrencyLimiter:
                def __init__(self, max_concurrent):
                    """
                    max_concurrent: 最大并发数
                    """
                    self.max_concurrent = max_concurrent
                    self.current = 0
                    self.lock = threading.Lock()
                
                def acquire(self):
                    """获取并发许可"""
                    with self.lock:
                        if self.current < self.max_concurrent:
                            self.current += 1
                            return True
                        else:
                            return False
                
                def release(self):
                    """释放并发许可"""
                    with self.lock:
                        if self.current > 0:
                            self.current -= 1
                
                def get_current(self):
                    """获取当前并发数"""
                    with self.lock:
                        return self.current
            
            # 测试限流器
            print("测试令牌桶限流器:")
            token_limiter = TokenBucketLimiter(rate=10, capacity=20)
            
            success_count = 0
            for i in range(50):
                if token_limiter.acquire():
                    success_count += 1
            
            print(f"  尝试50次请求,成功{success_count}次")
            
            print("\n测试滑动窗口限流器:")
            window_limiter = SlidingWindowLimiter(max_requests=100, window_seconds=1)
            
            success_count = 0
            for i in range(150):
                if window_limiter.acquire():
                    success_count += 1
            
            print(f"  尝试150次请求,成功{success_count}次")
            print(f"  当前请求率: {window_limiter.get_current_rate():.2f} qps")
            
            print("\n测试并发限流器:")
            concurrency_limiter = ConcurrencyLimiter(max_concurrent=10)
            
            acquired = 0
            for i in range(20):
                if concurrency_limiter.acquire():
                    acquired += 1
            
            print(f"  尝试获取20个并发,成功{acquired}个")
            print(f"  当前并发数: {concurrency_limiter.get_current()}")
            
            print("\n限流策略建议:")
            print("  1. 令牌桶: 允许突发流量,平滑限流")
            print("  2. 滑动窗口: 精确控制时间窗口内请求数")
            print("  3. 并发限流: 控制同时执行的请求数")
            print("  4. 组合使用: QPS + 并发双重限流")
            print("  5. 分级限流: 不同用户不同限制")
            ---

02.资源隔离
    a.资源组配置
        a.功能说明
            资源组实现多租户资源隔离,避免相互影响。可以为不同业务分配独立的QueryNode资源。每个资源组有独立的内存和CPU配额。支持动态调整资源组配置。可以实现优先级调度,保证核心业务。适合多租户、多业务场景。需要合理规划资源分配。
        b.代码示例
            ---
            from pymilvus import utility
            
            # 资源组管理类
            class ResourceGroupManager:
                @staticmethod
                def create_resource_group(name, config=None):
                    """创建资源组"""
                    if config is None:
                        config = {
                            "requests": {"node_num": 1},
                            "limits": {"node_num": 2}
                        }
                    
                    try:
                        utility.create_resource_group(name, config=config)
                        print(f"创建资源组: {name}")
                        print(f"  配置: {config}")
                    except Exception as e:
                        print(f"创建资源组失败: {e}")
                
                @staticmethod
                def list_resource_groups():
                    """列出所有资源组"""
                    try:
                        groups = utility.list_resource_groups()
                        print("\n资源组列表:")
                        for group in groups:
                            print(f"  - {group}")
                        return groups
                    except Exception as e:
                        print(f"列出资源组失败: {e}")
                        return []
                
                @staticmethod
                def describe_resource_group(name):
                    """查看资源组详情"""
                    try:
                        info = utility.describe_resource_group(name)
                        print(f"\n资源组详情: {name}")
                        print(f"  {info}")
                        return info
                    except Exception as e:
                        print(f"查看资源组失败: {e}")
                        return None
                
                @staticmethod
                def transfer_node(source_group, target_group, num_nodes=1):
                    """在资源组间转移节点"""
                    try:
                        utility.transfer_node(source_group, target_group, num_nodes)
                        print(f"转移{num_nodes}个节点: {source_group} -> {target_group}")
                    except Exception as e:
                        print(f"转移节点失败: {e}")
                
                @staticmethod
                def drop_resource_group(name):
                    """删除资源组"""
                    try:
                        utility.drop_resource_group(name)
                        print(f"删除资源组: {name}")
                    except Exception as e:
                        print(f"删除资源组失败: {e}")
            
            # 使用资源组管理器
            manager = ResourceGroupManager()
            
            # 创建资源组
            print("创建资源组:")
            manager.create_resource_group("business_a", config={"requests": {"node_num": 2}})
            manager.create_resource_group("business_b", config={"requests": {"node_num": 1}})
            manager.create_resource_group("business_c", config={"requests": {"node_num": 1}})
            
            # 列出资源组
            groups = manager.list_resource_groups()
            
            # 查看资源组详情
            for group in groups:
                manager.describe_resource_group(group)
            
            # 资源组使用示例
            print("\n资源组使用场景:")
            print("  1. 多租户隔离: 每个租户独立资源组")
            print("  2. 业务隔离: 核心业务和非核心业务分离")
            print("  3. 环境隔离: 生产、测试、开发环境分离")
            print("  4. 优先级保证: 高优先级业务独享资源")
            print("  5. 资源弹性: 动态调整资源分配")
            ---
    b.查询优先级
        a.功能说明
            查询优先级确保重要查询优先执行。可以为不同查询设置优先级级别。高优先级查询优先获取资源和执行。低优先级查询在资源紧张时可能被延迟或拒绝。适合多业务场景,保证核心业务SLA。需要合理设置优先级策略。实现优先级队列和调度算法。监控不同优先级的查询性能。
        b.代码示例
            ---
            import time
            import threading
            from queue import PriorityQueue
            from pymilvus import Collection
            import numpy as np
            
            # 优先级查询管理器
            class PriorityQueryManager:
                def __init__(self, collection, max_workers=4):
                    self.collection = collection
                    self.max_workers = max_workers
                    self.query_queue = PriorityQueue()
                    self.workers = []
                    self.running = False
                    self.stats = {
                        "high": {"count": 0, "total_latency": 0},
                        "medium": {"count": 0, "total_latency": 0},
                        "low": {"count": 0, "total_latency": 0}
                    }
                    self.stats_lock = threading.Lock()
                
                def start(self):
                    """启动工作线程"""
                    self.running = True
                    
                    for i in range(self.max_workers):
                        worker = threading.Thread(target=self._worker, args=(i,))
                        worker.daemon = True
                        worker.start()
                        self.workers.append(worker)
                    
                    print(f"启动{self.max_workers}个工作线程")
                
                def stop(self):
                    """停止工作线程"""
                    self.running = False
                    
                    for worker in self.workers:
                        worker.join()
                    
                    print("所有工作线程已停止")
                
                def _worker(self, worker_id):
                    """工作线程"""
                    while self.running:
                        try:
                            # 获取查询任务(优先级高的先执行)
                            priority, query_id, query_vector, priority_name = self.query_queue.get(timeout=0.1)
                            
                            # 执行查询
                            start = time.time()
                            
                            search_params = {
                                "metric_type": "L2",
                                "params": {"nprobe": 16}
                            }
                            
                            results = self.collection.search(
                                data=[query_vector],
                                anns_field="embedding",
                                param=search_params,
                                limit=10
                            )
                            
                            latency = time.time() - start
                            
                            # 更新统计
                            with self.stats_lock:
                                self.stats[priority_name]["count"] += 1
                                self.stats[priority_name]["total_latency"] += latency
                            
                            print(f"工作线程{worker_id}: 完成查询{query_id} (优先级:{priority_name}, 延迟:{latency*1000:.2f}ms)")
                            
                            self.query_queue.task_done()
                        
                        except:
                            pass
                
                def submit_query(self, query_vector, priority="medium"):
                    """提交查询"""
                    # 优先级映射(数字越小优先级越高)
                    priority_map = {
                        "high": 0,
                        "medium": 1,
                        "low": 2
                    }
                    
                    priority_value = priority_map.get(priority, 1)
                    query_id = f"{priority}_{int(time.time()*1000000)}"
                    
                    self.query_queue.put((priority_value, query_id, query_vector, priority))
                    
                    return query_id
                
                def get_stats(self):
                    """获取统计信息"""
                    with self.stats_lock:
                        stats_copy = {}
                        
                        for priority, data in self.stats.items():
                            if data["count"] > 0:
                                avg_latency = data["total_latency"] / data["count"]
                            else:
                                avg_latency = 0
                            
                            stats_copy[priority] = {
                                "count": data["count"],
                                "avg_latency": avg_latency * 1000  # ms
                            }
                        
                        return stats_copy
            
            # 使用优先级查询管理器
            collection = Collection("documents")
            collection.load()
            
            manager = PriorityQueryManager(collection, max_workers=4)
            manager.start()
            
            # 提交不同优先级的查询
            print("\n提交查询:")
            
            for i in range(10):
                query_vector = [np.random.random() for _ in range(128)]
                
                if i < 3:
                    priority = "high"
                elif i < 7:
                    priority = "medium"
                else:
                    priority = "low"
                
                query_id = manager.submit_query(query_vector, priority=priority)
                print(f"  提交查询{query_id} (优先级:{priority})")
                
                time.sleep(0.1)
            
            # 等待所有查询完成
            manager.query_queue.join()
            
            # 打印统计
            stats = manager.get_stats()
            
            print(f"\n查询统计:")
            print(f"{'优先级':>10s} {'数量':>8s} {'平均延迟':>12s}")
            print("-" * 35)
            
            for priority in ["high", "medium", "low"]:
                if priority in stats:
                    print(f"{priority:>10s} {stats[priority]['count']:8d} {stats[priority]['avg_latency']:11.2f}ms")
            
            # 停止管理器
            manager.stop()
            
            print("\n优先级策略建议:")
            print("  1. 核心业务: 高优先级")
            print("  2. 常规业务: 中优先级")
            print("  3. 批量任务: 低优先级")
            print("  4. 监控不同优先级的性能")
            print("  5. 动态调整优先级策略")
            ---

8.5 缓存策略

01.查询缓存
    a.缓存机制
        a.功能说明
            查询缓存存储热点查询结果,减少重复计算。相同查询向量可以直接返回缓存结果。缓存命中可以显著降低查询延迟。适合查询模式重复的场景,如推荐系统。需要配置缓存大小和过期策略。缓存会占用额外内存。需要权衡缓存收益和内存开销。实现缓存预热和失效机制。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import hashlib
            import json
            
            collection = Collection("documents")
            collection.load()
            
            # 查询缓存类
            class QueryCache:
                def __init__(self, max_size=1000, ttl=300):
                    """
                    max_size: 最大缓存条目数
                    ttl: 缓存过期时间(秒)
                    """
                    self.max_size = max_size
                    self.ttl = ttl
                    self.cache = {}
                    self.access_count = {}
                    self.hit_count = 0
                    self.miss_count = 0
                
                def _generate_key(self, query_vector, search_params, limit):
                    """生成缓存键"""
                    # 将查询参数序列化为字符串
                    key_data = {
                        "vector": [round(v, 6) for v in query_vector],  # 保留6位小数
                        "params": search_params,
                        "limit": limit
                    }
                    
                    key_str = json.dumps(key_data, sort_keys=True)
                    key_hash = hashlib.md5(key_str.encode()).hexdigest()
                    
                    return key_hash
                
                def get(self, query_vector, search_params, limit):
                    """从缓存获取结果"""
                    key = self._generate_key(query_vector, search_params, limit)
                    
                    if key in self.cache:
                        entry = self.cache[key]
                        
                        # 检查是否过期
                        if time.time() - entry["timestamp"] < self.ttl:
                            self.hit_count += 1
                            self.access_count[key] = self.access_count.get(key, 0) + 1
                            return entry["results"]
                        else:
                            # 过期,删除缓存
                            del self.cache[key]
                            if key in self.access_count:
                                del self.access_count[key]
                    
                    self.miss_count += 1
                    return None
                
                def put(self, query_vector, search_params, limit, results):
                    """将结果放入缓存"""
                    key = self._generate_key(query_vector, search_params, limit)
                    
                    # 检查缓存大小
                    if len(self.cache) >= self.max_size:
                        # LRU淘汰:删除访问次数最少的
                        if self.access_count:
                            lru_key = min(self.access_count, key=self.access_count.get)
                            del self.cache[lru_key]
                            del self.access_count[lru_key]
                    
                    self.cache[key] = {
                        "results": results,
                        "timestamp": time.time()
                    }
                    
                    self.access_count[key] = 0
                
                def get_stats(self):
                    """获取缓存统计"""
                    total_requests = self.hit_count + self.miss_count
                    hit_rate = self.hit_count / total_requests if total_requests > 0 else 0
                    
                    return {
                        "cache_size": len(self.cache),
                        "max_size": self.max_size,
                        "hit_count": self.hit_count,
                        "miss_count": self.miss_count,
                        "hit_rate": hit_rate,
                        "total_requests": total_requests
                    }
                
                def clear(self):
                    """清空缓存"""
                    self.cache.clear()
                    self.access_count.clear()
                    self.hit_count = 0
                    self.miss_count = 0
            
            # 带缓存的查询类
            class CachedSearch:
                def __init__(self, collection, cache):
                    self.collection = collection
                    self.cache = cache
                
                def search(self, query_vector, search_params, limit=10):
                    """带缓存的查询"""
                    # 尝试从缓存获取
                    cached_results = self.cache.get(query_vector, search_params, limit)
                    
                    if cached_results is not None:
                        return cached_results, True  # 缓存命中
                    
                    # 缓存未命中,执行实际查询
                    results = self.collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=search_params,
                        limit=limit
                    )
                    
                    # 将结果放入缓存
                    self.cache.put(query_vector, search_params, limit, results[0])
                    
                    return results[0], False  # 缓存未命中
            
            # 使用查询缓存
            cache = QueryCache(max_size=100, ttl=60)
            cached_search = CachedSearch(collection, cache)
            
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            # 生成一些查询向量
            query_vectors = [[np.random.random() for _ in range(128)] for _ in range(10)]
            
            print("测试查询缓存:\n")
            
            # 第一轮查询(缓存未命中)
            print("第一轮查询(缓存未命中):")
            for i, query_vector in enumerate(query_vectors):
                start = time.time()
                results, hit = cached_search.search(query_vector, search_params)
                latency = time.time() - start
                
                print(f"  查询{i+1}: {'命中' if hit else '未命中'}, 延迟: {latency*1000:.2f}ms")
            
            # 第二轮查询(缓存命中)
            print("\n第二轮查询(缓存命中):")
            for i, query_vector in enumerate(query_vectors):
                start = time.time()
                results, hit = cached_search.search(query_vector, search_params)
                latency = time.time() - start
                
                print(f"  查询{i+1}: {'命中' if hit else '未命中'}, 延迟: {latency*1000:.2f}ms")
            
            # 打印缓存统计
            stats = cache.get_stats()
            
            print(f"\n缓存统计:")
            print(f"  缓存大小: {stats['cache_size']}/{stats['max_size']}")
            print(f"  命中次数: {stats['hit_count']}")
            print(f"  未命中次数: {stats['miss_count']}")
            print(f"  命中率: {stats['hit_rate']*100:.1f}%")
            print(f"  总请求数: {stats['total_requests']}")
            
            print("\n查询缓存建议:")
            print("  1. 适合查询模式重复的场景")
            print("  2. 配置合适的缓存大小和TTL")
            print("  3. 使用LRU等淘汰策略")
            print("  4. 监控缓存命中率")
            print("  5. 数据更新时及时失效缓存")
            ---
    b.缓存预热
        a.功能说明
            缓存预热在系统启动时预先加载热点数据。避免冷启动时大量缓存未命中。可以根据历史查询日志识别热点查询。预热可以显著提升初期性能。需要平衡预热时间和收益。可以异步预热,不阻塞服务启动。实现增量预热,逐步加载数据。监控预热效果,优化预热策略。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import json
            
            collection = Collection("documents")
            collection.load()
            
            # 缓存预热类
            class CacheWarmer:
                def __init__(self, cached_search):
                    self.cached_search = cached_search
                
                def warm_from_queries(self, query_list):
                    """从查询列表预热缓存"""
                    print(f"\n开始缓存预热,共{len(query_list)}个查询...")
                    
                    start = time.time()
                    
                    for i, query_info in enumerate(query_list):
                        query_vector = query_info["vector"]
                        search_params = query_info["params"]
                        limit = query_info.get("limit", 10)
                        
                        # 执行查询,填充缓存
                        self.cached_search.search(query_vector, search_params, limit)
                        
                        if (i + 1) % 10 == 0:
                            print(f"  已预热 {i+1}/{len(query_list)} 个查询")
                    
                    elapsed = time.time() - start
                    
                    print(f"缓存预热完成,耗时: {elapsed:.2f}s")
                    
                    return elapsed
                
                def warm_from_log(self, log_file, top_n=100):
                    """从查询日志预热缓存"""
                    print(f"\n从查询日志预热缓存(Top {top_n})...")
                    
                    # 读取查询日志
                    try:
                        with open(log_file, 'r') as f:
                            logs = json.load(f)
                        
                        # 统计查询频率
                        query_freq = {}
                        for log in logs:
                            query_key = json.dumps(log, sort_keys=True)
                            query_freq[query_key] = query_freq.get(query_key, 0) + 1
                        
                        # 选择Top N热点查询
                        top_queries = sorted(query_freq.items(), key=lambda x: x[1], reverse=True)[:top_n]
                        
                        # 预热
                        query_list = [json.loads(q[0]) for q in top_queries]
                        elapsed = self.warm_from_queries(query_list)
                        
                        print(f"预热了{len(query_list)}个热点查询")
                        
                        return elapsed
                    
                    except Exception as e:
                        print(f"从日志预热失败: {e}")
                        return 0
                
                def warm_async(self, query_list, callback=None):
                    """异步预热缓存"""
                    import threading
                    
                    def warm_task():
                        elapsed = self.warm_from_queries(query_list)
                        
                        if callback:
                            callback(elapsed)
                    
                    thread = threading.Thread(target=warm_task, daemon=True)
                    thread.start()
                    
                    print("异步预热已启动")
                    
                    return thread
            
            # 使用缓存预热
            cache = QueryCache(max_size=100, ttl=300)
            cached_search = CachedSearch(collection, cache)
            warmer = CacheWarmer(cached_search)
            
            # 准备预热查询列表
            warm_queries = []
            search_params = {
                "metric_type": "L2",
                "params": {"nprobe": 16}
            }
            
            for i in range(20):
                warm_queries.append({
                    "vector": [np.random.random() for _ in range(128)],
                    "params": search_params,
                    "limit": 10
                })
            
            # 同步预热
            warmer.warm_from_queries(warm_queries)
            
            # 验证预热效果
            stats = cache.get_stats()
            print(f"\n预热后缓存统计:")
            print(f"  缓存大小: {stats['cache_size']}")
            
            # 异步预热示例
            def on_warm_complete(elapsed):
                print(f"\n异步预热完成回调: 耗时{elapsed:.2f}s")
            
            warmer.warm_async(warm_queries[:10], callback=on_warm_complete)
            
            print("\n缓存预热建议:")
            print("  1. 启动时预热热点查询")
            print("  2. 从历史日志识别热点")
            print("  3. 异步预热,不阻塞启动")
            print("  4. 增量预热,逐步加载")
            print("  5. 监控预热效果,优化策略")
            ---

02.数据缓存
    a.Collection缓存
        a.功能说明
            Collection缓存将常用Collection保持在内存中。避免频繁加载释放Collection的开销。适合多Collection场景,优先缓存热点Collection。需要配置缓存大小,避免内存溢出。实现LRU淘汰策略,自动管理缓存。监控Collection访问频率,动态调整缓存。可以预加载预期会使用的Collection。
        b.代码示例
            ---
            from pymilvus import Collection
            import time
            from collections import OrderedDict
            
            # Collection缓存管理器
            class CollectionCacheManager:
                def __init__(self, max_cached=10):
                    """
                    max_cached: 最大缓存Collection数量
                    """
                    self.max_cached = max_cached
                    self.cache = OrderedDict()
                    self.access_count = {}
                    self.hit_count = 0
                    self.miss_count = 0
                
                def get_collection(self, collection_name):
                    """获取Collection(带缓存)"""
                    if collection_name in self.cache:
                        # 缓存命中
                        self.hit_count += 1
                        self.access_count[collection_name] = self.access_count.get(collection_name, 0) + 1
                        
                        # 移到最后(LRU)
                        self.cache.move_to_end(collection_name)
                        
                        return self.cache[collection_name]
                    else:
                        # 缓存未命中
                        self.miss_count += 1
                        
                        # 加载Collection
                        collection = Collection(collection_name)
                        
                        # 检查缓存大小
                        if len(self.cache) >= self.max_cached:
                            # 淘汰最久未使用的
                            evicted_name, evicted_collection = self.cache.popitem(last=False)
                            
                            # 释放Collection
                            try:
                                evicted_collection.release()
                                print(f"  淘汰Collection: {evicted_name}")
                            except:
                                pass
                        
                        # 加载并缓存
                        collection.load()
                        self.cache[collection_name] = collection
                        self.access_count[collection_name] = 1
                        
                        print(f"  加载Collection: {collection_name}")
                        
                        return collection
                
                def preload_collections(self, collection_names):
                    """预加载Collection"""
                    print(f"\n预加载{len(collection_names)}个Collection...")
                    
                    for name in collection_names:
                        self.get_collection(name)
                    
                    print("预加载完成")
                
                def get_stats(self):
                    """获取缓存统计"""
                    total_requests = self.hit_count + self.miss_count
                    hit_rate = self.hit_count / total_requests if total_requests > 0 else 0
                    
                    return {
                        "cached_collections": len(self.cache),
                        "max_cached": self.max_cached,
                        "hit_count": self.hit_count,
                        "miss_count": self.miss_count,
                        "hit_rate": hit_rate,
                        "access_count": self.access_count.copy()
                    }
                
                def clear(self):
                    """清空缓存"""
                    for collection in self.cache.values():
                        try:
                            collection.release()
                        except:
                            pass
                    
                    self.cache.clear()
                    self.access_count.clear()
                    print("缓存已清空")
            
            # 使用Collection缓存管理器
            cache_manager = CollectionCacheManager(max_cached=5)
            
            # 模拟访问多个Collection
            collection_names = ["coll_1", "coll_2", "coll_3", "coll_4", "coll_5", "coll_6"]
            
            print("测试Collection缓存:\n")
            
            # 第一轮访问
            print("第一轮访问:")
            for name in collection_names:
                try:
                    collection = cache_manager.get_collection(name)
                except:
                    print(f"  加载{name}失败(Collection可能不存在)")
            
            # 第二轮访问(部分命中)
            print("\n第二轮访问:")
            for name in collection_names[:3]:
                try:
                    collection = cache_manager.get_collection(name)
                except:
                    pass
            
            # 打印统计
            stats = cache_manager.get_stats()
            
            print(f"\n缓存统计:")
            print(f"  缓存Collection数: {stats['cached_collections']}/{stats['max_cached']}")
            print(f"  命中次数: {stats['hit_count']}")
            print(f"  未命中次数: {stats['miss_count']}")
            print(f"  命中率: {stats['hit_rate']*100:.1f}%")
            
            print(f"\n访问频率:")
            for name, count in sorted(stats['access_count'].items(), key=lambda x: x[1], reverse=True):
                print(f"  {name}: {count}次")
            
            # 清空缓存
            cache_manager.clear()
            
            print("\nCollection缓存建议:")
            print("  1. 缓存热点Collection")
            print("  2. 使用LRU淘汰策略")
            print("  3. 配置合适的缓存大小")
            print("  4. 预加载预期使用的Collection")
            print("  5. 监控访问频率,动态调整")
            ---
    b.结果缓存
        a.功能说明
            结果缓存存储查询结果,避免重复计算。适合查询结果较大的场景,如返回大量向量。可以缓存中间结果,如召回结果、排序结果等。需要考虑缓存一致性,数据更新时失效缓存。实现分层缓存,L1内存缓存+L2磁盘缓存。监控缓存命中率和内存使用。权衡缓存收益和维护成本。
        b.代码示例
            ---
            from pymilvus import Collection
            import numpy as np
            import time
            import pickle
            import os
            
            # 分层结果缓存
            class TieredResultCache:
                def __init__(self, l1_max_size=100, l2_cache_dir="/tmp/milvus_cache"):
                    """
                    l1_max_size: L1内存缓存大小
                    l2_cache_dir: L2磁盘缓存目录
                    """
                    self.l1_max_size = l1_max_size
                    self.l2_cache_dir = l2_cache_dir
                    self.l1_cache = {}  # 内存缓存
                    self.l1_hit = 0
                    self.l2_hit = 0
                    self.miss = 0
                    
                    # 创建L2缓存目录
                    os.makedirs(l2_cache_dir, exist_ok=True)
                
                def _get_cache_path(self, key):
                    """获取L2缓存文件路径"""
                    return os.path.join(self.l2_cache_dir, f"{key}.pkl")
                
                def get(self, key):
                    """获取缓存结果"""
                    # L1缓存查找
                    if key in self.l1_cache:
                        self.l1_hit += 1
                        return self.l1_cache[key], "L1"
                    
                    # L2缓存查找
                    cache_path = self._get_cache_path(key)
                    if os.path.exists(cache_path):
                        try:
                            with open(cache_path, 'rb') as f:
                                results = pickle.load(f)
                            
                            self.l2_hit += 1
                            
                            # 提升到L1缓存
                            self._put_l1(key, results)
                            
                            return results, "L2"
                        except:
                            pass
                    
                    # 缓存未命中
                    self.miss += 1
                    return None, None
                
                def _put_l1(self, key, results):
                    """放入L1缓存"""
                    # 检查L1缓存大小
                    if len(self.l1_cache) >= self.l1_max_size:
                        # 淘汰一个(简单FIFO)
                        evicted_key = next(iter(self.l1_cache))
                        evicted_results = self.l1_cache.pop(evicted_key)
                        
                        # 写入L2缓存
                        self._put_l2(evicted_key, evicted_results)
                    
                    self.l1_cache[key] = results
                
                def _put_l2(self, key, results):
                    """放入L2缓存"""
                    cache_path = self._get_cache_path(key)
                    
                    try:
                        with open(cache_path, 'wb') as f:
                            pickle.dump(results, f)
                    except:
                        pass
                
                def put(self, key, results):
                    """放入缓存"""
                    self._put_l1(key, results)
                
                def get_stats(self):
                    """获取统计信息"""
                    total = self.l1_hit + self.l2_hit + self.miss
                    
                    return {
                        "l1_size": len(self.l1_cache),
                        "l1_hit": self.l1_hit,
                        "l2_hit": self.l2_hit,
                        "miss": self.miss,
                        "total": total,
                        "hit_rate": (self.l1_hit + self.l2_hit) / total if total > 0 else 0
                    }
                
                def clear(self):
                    """清空缓存"""
                    self.l1_cache.clear()
                    
                    # 清空L2缓存
                    for filename in os.listdir(self.l2_cache_dir):
                        filepath = os.path.join(self.l2_cache_dir, filename)
                        try:
                            os.remove(filepath)
                        except:
                            pass
            
            # 使用分层缓存
            tiered_cache = TieredResultCache(l1_max_size=5)
            
            print("测试分层结果缓存:\n")
            
            # 模拟查询和缓存
            for i in range(10):
                key = f"query_{i}"
                
                # 尝试从缓存获取
                results, source = tiered_cache.get(key)
                
                if results is None:
                    # 缓存未命中,生成结果
                    results = [np.random.random() for _ in range(100)]
                    tiered_cache.put(key, results)
                    print(f"  {key}: 未命中,生成结果")
                else:
                    print(f"  {key}: 命中({source}缓存)")
            
            # 再次访问前几个查询
            print("\n再次访问:")
            for i in range(5):
                key = f"query_{i}"
                results, source = tiered_cache.get(key)
                print(f"  {key}: {'命中' if results else '未命中'}({source}缓存)")
            
            # 打印统计
            stats = tiered_cache.get_stats()
            
            print(f"\n缓存统计:")
            print(f"  L1缓存大小: {stats['l1_size']}")
            print(f"  L1命中: {stats['l1_hit']}")
            print(f"  L2命中: {stats['l2_hit']}")
            print(f"  未命中: {stats['miss']}")
            print(f"  总命中率: {stats['hit_rate']*100:.1f}%")
            
            # 清空缓存
            tiered_cache.clear()
            
            print("\n结果缓存建议:")
            print("  1. 缓存大结果,避免重复计算")
            print("  2. 分层缓存,平衡速度和容量")
            print("  3. L1内存缓存热点,L2磁盘缓存冷数据")
            print("  4. 数据更新时及时失效缓存")
            print("  5. 监控缓存命中率和大小")
            ---

9 集群部署

9.1 分布式架构

01.架构组件
    a.组件角色
        a.功能说明
            Milvus采用存储计算分离的分布式架构。主要组件包括Coordinator(协调器)、Worker Node(工作节点)、存储层。Coordinator包括Root Coord、Data Coord、Query Coord、Index Coord。Worker Node包括Query Node、Data Node、Index Node。存储层使用MinIO/S3存储向量数据,etcd存储元数据,Pulsar/Kafka作为消息队列。各组件独立扩展,支持水平扩容。
        b.代码示例
            ---
            # Milvus分布式架构组件说明
            
            architecture = {
                "coordinators": {
                    "root_coord": {
                        "role": "全局协调器",
                        "responsibilities": [
                            "DDL操作(创建/删除Collection)",
                            "分配时间戳",
                            "管理数据通道"
                        ],
                        "count": 1  # 单实例
                    },
                    "data_coord": {
                        "role": "数据协调器",
                        "responsibilities": [
                            "管理数据分段",
                            "分配数据写入任务",
                            "触发数据持久化"
                        ],
                        "count": 1
                    },
                    "query_coord": {
                        "role": "查询协调器",
                        "responsibilities": [
                            "管理查询节点",
                            "分配查询任务",
                            "负载均衡"
                        ],
                        "count": 1
                    },
                    "index_coord": {
                        "role": "索引协调器",
                        "responsibilities": [
                            "管理索引构建",
                            "分配索引任务",
                            "监控索引进度"
                        ],
                        "count": 1
                    }
                },
                "workers": {
                    "query_node": {
                        "role": "查询节点",
                        "responsibilities": [
                            "执行向量检索",
                            "加载数据到内存",
                            "处理查询请求"
                        ],
                        "scalable": True,  # 可水平扩展
                        "recommended_count": "2-10"
                    },
                    "data_node": {
                        "role": "数据节点",
                        "responsibilities": [
                            "接收数据写入",
                            "数据持久化",
                            "数据合并"
                        ],
                        "scalable": True,
                        "recommended_count": "1-5"
                    },
                    "index_node": {
                        "role": "索引节点",
                        "responsibilities": [
                            "构建向量索引",
                            "索引优化",
                            "索引持久化"
                        ],
                        "scalable": True,
                        "recommended_count": "1-5"
                    }
                },
                "storage": {
                    "object_storage": {
                        "type": "MinIO/S3",
                        "stores": "向量数据、索引文件",
                        "required": True
                    },
                    "meta_storage": {
                        "type": "etcd",
                        "stores": "元数据、配置信息",
                        "required": True
                    },
                    "message_queue": {
                        "type": "Pulsar/Kafka",
                        "purpose": "数据流、事件通知",
                        "required": True
                    }
                }
            }
            
            print("Milvus分布式架构组件:\n")
            
            print("协调器组件:")
            for name, info in architecture["coordinators"].items():
                print(f"  {name}:")
                print(f"    角色: {info['role']}")
                print(f"    职责: {', '.join(info['responsibilities'])}")
                print(f"    实例数: {info['count']}")
            
            print("\n工作节点:")
            for name, info in architecture["workers"].items():
                print(f"  {name}:")
                print(f"    角色: {info['role']}")
                print(f"    职责: {', '.join(info['responsibilities'])}")
                print(f"    可扩展: {'是' if info['scalable'] else '否'}")
                print(f"    推荐数量: {info['recommended_count']}")
            
            print("\n存储层:")
            for name, info in architecture["storage"].items():
                print(f"  {name}:")
                print(f"    类型: {info['type']}")
                print(f"    存储内容: {info.get('stores', info.get('purpose'))}")
                print(f"    必需: {'是' if info['required'] else '否'}")
            
            print("\n架构特点:")
            print("  1. 存储计算分离,独立扩展")
            print("  2. 无状态Worker,易于水平扩展")
            print("  3. 协调器单点,通过主备保证高可用")
            print("  4. 统一存储层,支持多种存储后端")
            print("  5. 消息队列解耦,异步处理")
            ---
    b.数据流转
        a.功能说明
            数据在Milvus中经历写入、持久化、索引、查询等流程。写入数据首先进入消息队列,Data Node消费并持久化。持久化后触发索引构建,Index Node构建索引。查询时Query Node从存储层加载数据和索引。通过消息队列实现异步解耦。数据分段管理,支持增量更新。采用LSM-tree类似的设计,定期合并小段。
        b.代码示例
            ---
            # Milvus数据流转流程
            
            data_flow = {
                "write_path": [
                    {
                        "step": 1,
                        "component": "SDK/Client",
                        "action": "发送insert请求",
                        "data": "向量数据 + 标量字段"
                    },
                    {
                        "step": 2,
                        "component": "Proxy",
                        "action": "路由请求到Data Coord",
                        "data": "分配时间戳和数据通道"
                    },
                    {
                        "step": 3,
                        "component": "Data Coord",
                        "action": "分配数据段和Data Node",
                        "data": "segment分配信息"
                    },
                    {
                        "step": 4,
                        "component": "Message Queue",
                        "action": "写入消息队列",
                        "data": "数据消息"
                    },
                    {
                        "step": 5,
                        "component": "Data Node",
                        "action": "消费消息,缓存数据",
                        "data": "内存缓冲区"
                    },
                    {
                        "step": 6,
                        "component": "Data Node",
                        "action": "达到阈值后持久化",
                        "data": "写入对象存储(S3/MinIO)"
                    },
                    {
                        "step": 7,
                        "component": "Data Coord",
                        "action": "触发索引构建",
                        "data": "索引任务"
                    },
                    {
                        "step": 8,
                        "component": "Index Node",
                        "action": "构建索引并持久化",
                        "data": "索引文件写入对象存储"
                    }
                ],
                "query_path": [
                    {
                        "step": 1,
                        "component": "SDK/Client",
                        "action": "发送search请求",
                        "data": "查询向量 + 参数"
                    },
                    {
                        "step": 2,
                        "component": "Proxy",
                        "action": "路由到Query Coord",
                        "data": "查询请求"
                    },
                    {
                        "step": 3,
                        "component": "Query Coord",
                        "action": "分配Query Node",
                        "data": "负载均衡分配"
                    },
                    {
                        "step": 4,
                        "component": "Query Node",
                        "action": "检查数据是否已加载",
                        "data": "内存中的数据和索引"
                    },
                    {
                        "step": 5,
                        "component": "Query Node",
                        "action": "如未加载,从对象存储加载",
                        "data": "加载数据和索引到内存"
                    },
                    {
                        "step": 6,
                        "component": "Query Node",
                        "action": "执行向量检索",
                        "data": "使用索引进行ANN搜索"
                    },
                    {
                        "step": 7,
                        "component": "Query Node",
                        "action": "返回结果",
                        "data": "Top-K结果"
                    },
                    {
                        "step": 8,
                        "component": "Proxy",
                        "action": "合并多个Query Node结果",
                        "data": "全局Top-K结果"
                    }
                ]
            }
            
            print("Milvus数据流转:\n")
            
            print("写入路径:")
            for step_info in data_flow["write_path"]:
                print(f"  步骤{step_info['step']}: {step_info['component']}")
                print(f"    操作: {step_info['action']}")
                print(f"    数据: {step_info['data']}")
            
            print("\n查询路径:")
            for step_info in data_flow["query_path"]:
                print(f"  步骤{step_info['step']}: {step_info['component']}")
                print(f"    操作: {step_info['action']}")
                print(f"    数据: {step_info['data']}")
            
            print("\n关键特性:")
            print("  1. 异步写入: 通过消息队列解耦")
            print("  2. 批量持久化: 提升写入吞吐量")
            print("  3. 延迟索引: 数据先可查,后建索引")
            print("  4. 按需加载: Query Node按需加载数据")
            print("  5. 结果合并: Proxy合并分布式查询结果")
            ---

02.部署模式
    a.单机模式
        a.功能说明
            单机模式所有组件运行在一个进程中,适合开发测试。资源占用小,部署简单。不支持水平扩展和高可用。数据量和QPS受限于单机性能。适合原型验证、功能测试、小规模应用。生产环境建议使用分布式模式。可以通过Docker快速部署。
        b.代码示例
            ---
            # 单机模式部署(Docker)
            
            docker_standalone = """
            # 拉取Milvus镜像
            docker pull milvusdb/milvus:latest
            
            # 下载配置文件
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
            
            # 启动Milvus
            docker-compose up -d
            
            # 查看状态
            docker-compose ps
            
            # 查看日志
            docker-compose logs -f milvus-standalone
            
            # 停止服务
            docker-compose down
            """
            
            print("Milvus单机模式部署:\n")
            print(docker_standalone)
            
            # 单机模式配置示例
            standalone_config = {
                "deployment": {
                    "mode": "standalone",
                    "components": "all-in-one",
                    "process_count": 1
                },
                "resources": {
                    "cpu": "4 cores",
                    "memory": "8 GB",
                    "disk": "100 GB SSD"
                },
                "limitations": {
                    "max_vectors": "~10M",
                    "max_qps": "~1000",
                    "scalability": "不支持",
                    "high_availability": "不支持"
                },
                "use_cases": [
                    "开发测试",
                    "功能验证",
                    "小规模应用",
                    "原型开发"
                ]
            }
            
            print("\n单机模式特点:")
            print(f"  部署模式: {standalone_config['deployment']['mode']}")
            print(f"  组件: {standalone_config['deployment']['components']}")
            print(f"  进程数: {standalone_config['deployment']['process_count']}")
            
            print(f"\n资源需求:")
            print(f"  CPU: {standalone_config['resources']['cpu']}")
            print(f"  内存: {standalone_config['resources']['memory']}")
            print(f"  磁盘: {standalone_config['resources']['disk']}")
            
            print(f"\n限制:")
            print(f"  最大向量数: {standalone_config['limitations']['max_vectors']}")
            print(f"  最大QPS: {standalone_config['limitations']['max_qps']}")
            print(f"  可扩展性: {standalone_config['limitations']['scalability']}")
            print(f"  高可用: {standalone_config['limitations']['high_availability']}")
            
            print(f"\n适用场景:")
            for use_case in standalone_config['use_cases']:
                print(f"  - {use_case}")
            ---
    b.集群模式
        a.功能说明
            集群模式各组件独立部署,支持水平扩展。Coordinator和Worker分离,Worker可独立扩展。支持高可用配置,Coordinator主备切换。适合生产环境,支持大规模数据和高并发。需要部署etcd、MinIO/S3、Pulsar/Kafka等依赖。推荐使用Kubernetes部署和管理。可以根据负载动态扩缩容。
        b.代码示例
            ---
            # 集群模式架构配置
            
            cluster_config = {
                "deployment": {
                    "mode": "cluster",
                    "components": "分布式部署",
                    "coordinators": {
                        "root_coord": {"count": 1, "ha": "主备"},
                        "data_coord": {"count": 1, "ha": "主备"},
                        "query_coord": {"count": 1, "ha": "主备"},
                        "index_coord": {"count": 1, "ha": "主备"}
                    },
                    "workers": {
                        "query_node": {"count": "2-10", "scalable": True},
                        "data_node": {"count": "1-5", "scalable": True},
                        "index_node": {"count": "1-5", "scalable": True}
                    }
                },
                "dependencies": {
                    "etcd": {
                        "purpose": "元数据存储",
                        "ha": "3节点集群",
                        "required": True
                    },
                    "minio_s3": {
                        "purpose": "对象存储",
                        "ha": "分布式部署",
                        "required": True
                    },
                    "pulsar_kafka": {
                        "purpose": "消息队列",
                        "ha": "集群部署",
                        "required": True
                    }
                },
                "resources": {
                    "coordinator": {
                        "cpu": "2 cores",
                        "memory": "4 GB"
                    },
                    "query_node": {
                        "cpu": "8 cores",
                        "memory": "32 GB"
                    },
                    "data_node": {
                        "cpu": "4 cores",
                        "memory": "16 GB"
                    },
                    "index_node": {
                        "cpu": "8 cores",
                        "memory": "16 GB"
                    }
                },
                "capabilities": {
                    "max_vectors": "100M+",
                    "max_qps": "10000+",
                    "scalability": "水平扩展",
                    "high_availability": "支持"
                },
                "use_cases": [
                    "生产环境",
                    "大规模应用",
                    "高并发场景",
                    "企业级应用"
                ]
            }
            
            print("Milvus集群模式配置:\n")
            
            print("协调器部署:")
            for name, info in cluster_config["deployment"]["coordinators"].items():
                print(f"  {name}: {info['count']}个实例, 高可用: {info['ha']}")
            
            print("\n工作节点部署:")
            for name, info in cluster_config["deployment"]["workers"].items():
                scalable = "支持" if info['scalable'] else "不支持"
                print(f"  {name}: {info['count']}个实例, 水平扩展: {scalable}")
            
            print("\n依赖组件:")
            for name, info in cluster_config["dependencies"].items():
                print(f"  {name}:")
                print(f"    用途: {info['purpose']}")
                print(f"    高可用: {info['ha']}")
                print(f"    必需: {'是' if info['required'] else '否'}")
            
            print("\n资源配置:")
            for component, resources in cluster_config["resources"].items():
                print(f"  {component}:")
                print(f"    CPU: {resources['cpu']}")
                print(f"    内存: {resources['memory']}")
            
            print("\n能力:")
            print(f"  最大向量数: {cluster_config['capabilities']['max_vectors']}")
            print(f"  最大QPS: {cluster_config['capabilities']['max_qps']}")
            print(f"  可扩展性: {cluster_config['capabilities']['scalability']}")
            print(f"  高可用: {cluster_config['capabilities']['high_availability']}")
            
            print("\n适用场景:")
            for use_case in cluster_config['use_cases']:
                print(f"  - {use_case}")
            
            print("\n集群模式优势:")
            print("  1. 水平扩展: Worker节点按需扩展")
            print("  2. 高可用: Coordinator主备,Worker多副本")
            print("  3. 资源隔离: 不同组件独立资源")
            print("  4. 弹性伸缩: 根据负载动态调整")
            print("  5. 故障隔离: 单个节点故障不影响整体")
            ---

9.2 Docker Compose

01.Compose配置
    a.服务定义
        a.功能说明
            Docker Compose简化Milvus集群部署,通过YAML文件定义所有服务。包括etcd、MinIO、Pulsar等依赖组件。定义网络、卷、环境变量等配置。支持一键启动和停止整个集群。适合开发测试和小规模生产环境。可以方便地调整资源配置。实现服务编排和依赖管理。
        b.代码示例
            ---
            # docker-compose.yml完整示例
            
            version: '3.5'
            
            services:
              etcd:
                container_name: milvus-etcd
                image: quay.io/coreos/etcd:v3.5.5
                environment:
                  - ETCD_AUTO_COMPACTION_MODE=revision
                  - ETCD_AUTO_COMPACTION_RETENTION=1000
                  - ETCD_QUOTA_BACKEND_BYTES=4294967296
                  - ETCD_SNAPSHOT_COUNT=50000
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
                command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
                networks:
                  - milvus
            
              minio:
                container_name: milvus-minio
                image: minio/minio:RELEASE.2023-03-20T20-16-18Z
                environment:
                  MINIO_ACCESS_KEY: minioadmin
                  MINIO_SECRET_KEY: minioadmin
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
                command: minio server /minio_data --console-address ":9001"
                ports:
                  - "9000:9000"
                  - "9001:9001"
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
                  interval: 30s
                  timeout: 20s
                  retries: 3
                networks:
                  - milvus
            
              pulsar:
                container_name: milvus-pulsar
                image: apachepulsar/pulsar:2.8.2
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/pulsar:/pulsar/data
                environment:
                  - PULSAR_MEM=" -Xms512m -Xmx512m -XX:MaxDirectMemorySize=1g"
                command: |
                  bash -c "bin/apply-config-from-env.py conf/standalone.conf && bin/pulsar standalone"
                networks:
                  - milvus
            
              standalone:
                container_name: milvus-standalone
                image: milvusdb/milvus:v2.3.0
                command: ["milvus", "run", "standalone"]
                environment:
                  ETCD_ENDPOINTS: etcd:2379
                  MINIO_ADDRESS: minio:9000
                  PULSAR_ADDRESS: pulsar://pulsar:6650
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
                ports:
                  - "19530:19530"
                  - "9091:9091"
                depends_on:
                  - "etcd"
                  - "minio"
                  - "pulsar"
                networks:
                  - milvus
            
            networks:
              milvus:
                name: milvus
            
            volumes:
              etcd:
              minio:
              pulsar:
              milvus:
            
            # 使用说明:
            # 1. 启动所有服务:docker-compose up -d
            # 2. 查看服务状态:docker-compose ps
            # 3. 查看日志:docker-compose logs -f standalone
            # 4. 停止服务:docker-compose down
            # 5. 停止并删除数据:docker-compose down -v
            ---
    b.资源配置
        a.功能说明
            通过Compose配置各服务的资源限制。设置CPU和内存限制,避免资源竞争。配置健康检查,自动重启失败服务。定义依赖关系,确保启动顺序。可以配置副本数,实现简单的高可用。支持环境变量覆盖默认配置。实现配置文件和数据持久化。
        b.代码示例
            ---
            # 资源配置增强版docker-compose.yml
            
            version: '3.5'
            
            services:
              standalone:
                container_name: milvus-standalone
                image: milvusdb/milvus:v2.3.0
                command: ["milvus", "run", "standalone"]
                environment:
                  ETCD_ENDPOINTS: etcd:2379
                  MINIO_ADDRESS: minio:9000
                  PULSAR_ADDRESS: pulsar://pulsar:6650
                  # 性能调优参数
                  QUERY_NODE_GRACEFUL_STOP_TIMEOUT: 30
                  QUERY_NODE_SEARCH_TIMEOUT: 30
                  DATA_NODE_FLUSH_INSERT_BUFFER_SIZE: 16777216
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus/logs:/var/log/milvus
                ports:
                  - "19530:19530"
                  - "9091:9091"
                depends_on:
                  - "etcd"
                  - "minio"
                  - "pulsar"
                deploy:
                  resources:
                    limits:
                      cpus: '4.0'
                      memory: 8G
                    reservations:
                      cpus: '2.0'
                      memory: 4G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
                  interval: 30s
                  timeout: 10s
                  retries: 3
                  start_period: 40s
                logging:
                  driver: "json-file"
                  options:
                    max-size: "100m"
                    max-file: "3"
                networks:
                  - milvus
            
              minio:
                container_name: milvus-minio
                image: minio/minio:RELEASE.2023-03-20T20-16-18Z
                environment:
                  MINIO_ACCESS_KEY: minioadmin
                  MINIO_SECRET_KEY: minioadmin
                volumes:
                  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
                command: minio server /minio_data --console-address ":9001"
                ports:
                  - "9000:9000"
                  - "9001:9001"
                deploy:
                  resources:
                    limits:
                      cpus: '2.0'
                      memory: 4G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
                  interval: 30s
                  timeout: 20s
                  retries: 3
                networks:
                  - milvus
            
            networks:
              milvus:
                name: milvus
                driver: bridge
            
            # 资源配置说明:
            # - limits: 容器可使用的最大资源
            # - reservations: 容器保证获得的资源
            # - restart: always 自动重启
            # - healthcheck: 健康检查配置
            # - logging: 日志配置,限制日志大小
            ---

02.部署实践
    a.快速部署
        a.功能说明
            使用官方提供的docker-compose.yml快速部署Milvus。下载配置文件,一键启动所有服务。自动拉取所需镜像,创建网络和卷。适合快速体验和功能测试。默认配置可满足基本需求。可以根据需要调整配置参数。支持数据持久化,重启不丢失数据。
        b.代码示例
            ---
            #!/bin/bash
            # 快速部署Milvus脚本
            
            set -e
            
            echo "=========================================="
            echo "Milvus快速部署脚本"
            echo "=========================================="
            
            # 检查Docker和Docker Compose
            echo "检查环境..."
            if ! command -v docker &> /dev/null; then
                echo "错误: Docker未安装"
                exit 1
            fi
            
            if ! command -v docker-compose &> /dev/null; then
                echo "错误: Docker Compose未安装"
                exit 1
            fi
            
            # 下载docker-compose配置文件
            echo "下载docker-compose配置文件..."
            wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
            
            # 创建数据目录
            echo "创建数据目录..."
            mkdir -p volumes/etcd volumes/minio volumes/pulsar volumes/milvus
            
            # 启动Milvus
            echo "启动Milvus服务..."
            docker-compose up -d
            
            # 等待服务启动
            echo "等待服务启动(约30秒)..."
            sleep 30
            
            # 检查服务状态
            echo ""
            echo "服务状态:"
            docker-compose ps
            
            # 检查Milvus健康状态
            echo ""
            echo "检查Milvus健康状态..."
            for i in {1..10}; do
                if curl -s http://localhost:9091/healthz | grep -q "OK"; then
                    echo "✓ Milvus服务健康"
                    break
                else
                    echo "等待Milvus就绪... ($i/10)"
                    sleep 5
                fi
            done
            
            echo ""
            echo "=========================================="
            echo "Milvus部署完成!"
            echo "=========================================="
            echo "连接信息:"
            echo "  - Milvus地址: localhost:19530"
            echo "  - Milvus管理界面: http://localhost:9091"
            echo "  - MinIO控制台: http://localhost:9001"
            echo "    用户名: minioadmin"
            echo "    密码: minioadmin"
            echo ""
            echo "常用命令:"
            echo "  - 查看日志: docker-compose logs -f standalone"
            echo "  - 停止服务: docker-compose down"
            echo "  - 重启服务: docker-compose restart"
            echo "=========================================="
            
            # 测试连接
            echo ""
            echo "测试连接..."
            python3 << 'PYTHON'
            from pymilvus import connections, utility
            import time
            
            max_retries = 5
            for i in range(max_retries):
                try:
                    connections.connect(host="localhost", port="19530")
                    print(f"✓ 连接成功!Milvus版本: {utility.get_server_version()}")
                    connections.disconnect("default")
                    break
                except Exception as e:
                    if i < max_retries - 1:
                        print(f"连接失败,重试... ({i+1}/{max_retries})")
                        time.sleep(5)
                    else:
                        print(f"✗ 连接失败: {e}")
            PYTHON
            ---
    b.生产部署
        a.功能说明
            生产环境需要更完善的配置和监控。配置资源限制和健康检查。实现日志收集和持久化。配置备份和恢复策略。使用外部存储,避免数据丢失。实现监控告警,及时发现问题。配置网络安全,限制访问权限。定期更新和维护。
        b.代码示例
            ---
            # 生产环境docker-compose.yml
            
            version: '3.5'
            
            services:
              etcd:
                container_name: milvus-etcd
                image: quay.io/coreos/etcd:v3.5.5
                environment:
                  - ETCD_AUTO_COMPACTION_MODE=revision
                  - ETCD_AUTO_COMPACTION_RETENTION=1000
                  - ETCD_QUOTA_BACKEND_BYTES=4294967296
                  - ETCD_SNAPSHOT_COUNT=50000
                  - ETCD_HEARTBEAT_INTERVAL=500
                  - ETCD_ELECTION_TIMEOUT=2500
                volumes:
                  - /data/milvus/etcd:/etcd
                command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
                deploy:
                  resources:
                    limits:
                      cpus: '2.0'
                      memory: 4G
                restart: always
                logging:
                  driver: "json-file"
                  options:
                    max-size: "200m"
                    max-file: "5"
                networks:
                  - milvus
            
              minio:
                container_name: milvus-minio
                image: minio/minio:RELEASE.2023-03-20T20-16-18Z
                environment:
                  MINIO_ACCESS_KEY: ${MINIO_ACCESS_KEY:-minioadmin}
                  MINIO_SECRET_KEY: ${MINIO_SECRET_KEY:-minioadmin}
                  MINIO_PROMETHEUS_AUTH_TYPE: public
                volumes:
                  - /data/milvus/minio:/minio_data
                command: minio server /minio_data --console-address ":9001"
                ports:
                  - "9000:9000"
                  - "9001:9001"
                deploy:
                  resources:
                    limits:
                      cpus: '4.0'
                      memory: 8G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
                  interval: 30s
                  timeout: 20s
                  retries: 3
                logging:
                  driver: "json-file"
                  options:
                    max-size: "200m"
                    max-file: "5"
                networks:
                  - milvus
            
              standalone:
                container_name: milvus-standalone
                image: milvusdb/milvus:v2.3.0
                command: ["milvus", "run", "standalone"]
                environment:
                  ETCD_ENDPOINTS: etcd:2379
                  MINIO_ADDRESS: minio:9000
                  MINIO_ACCESS_KEY_ID: ${MINIO_ACCESS_KEY:-minioadmin}
                  MINIO_SECRET_ACCESS_KEY: ${MINIO_SECRET_KEY:-minioadmin}
                  PULSAR_ADDRESS: pulsar://pulsar:6650
                  # 性能优化
                  QUERY_NODE_GRACEFUL_STOP_TIMEOUT: 30
                  QUERY_NODE_SEARCH_TIMEOUT: 30
                  DATA_NODE_FLUSH_INSERT_BUFFER_SIZE: 16777216
                  # 日志级别
                  LOG_LEVEL: info
                volumes:
                  - /data/milvus/data:/var/lib/milvus
                  - /data/milvus/logs:/var/log/milvus
                  - /data/milvus/config:/milvus/configs
                ports:
                  - "19530:19530"
                  - "9091:9091"
                depends_on:
                  - "etcd"
                  - "minio"
                  - "pulsar"
                deploy:
                  resources:
                    limits:
                      cpus: '8.0'
                      memory: 16G
                    reservations:
                      cpus: '4.0'
                      memory: 8G
                restart: always
                healthcheck:
                  test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
                  interval: 30s
                  timeout: 10s
                  retries: 3
                  start_period: 60s
                logging:
                  driver: "json-file"
                  options:
                    max-size: "200m"
                    max-file: "10"
                networks:
                  - milvus
            
            networks:
              milvus:
                name: milvus
                driver: bridge
            
            # 生产环境部署脚本
            # #!/bin/bash
            # 
            # # 设置环境变量
            # export MINIO_ACCESS_KEY="your-access-key"
            # export MINIO_SECRET_KEY="your-secret-key"
            # 
            # # 创建数据目录
            # mkdir -p /data/milvus/{etcd,minio,pulsar,data,logs,config}
            # 
            # # 设置权限
            # chmod 755 /data/milvus
            # 
            # # 启动服务
            # docker-compose up -d
            # 
            # # 配置备份定时任务
            # echo "0 2 * * * /opt/scripts/backup-milvus.sh" | crontab -
            # 
            # # 备份脚本示例
            # cat > /opt/scripts/backup-milvus.sh << 'EOF'
            # #!/bin/bash
            # BACKUP_DIR="/backup/milvus/$(date +%Y%m%d)"
            # mkdir -p $BACKUP_DIR
            # 
            # # 备份数据
            # tar -czf $BACKUP_DIR/milvus-data.tar.gz /data/milvus/data
            # tar -czf $BACKUP_DIR/milvus-etcd.tar.gz /data/milvus/etcd
            # tar -czf $BACKUP_DIR/milvus-minio.tar.gz /data/milvus/minio
            # 
            # # 保留最近7天的备份
            # find /backup/milvus -type d -mtime +7 -exec rm -rf {} \;
            # EOF
            # 
            # chmod +x /opt/scripts/backup-milvus.sh
            ---

9.3 Kubernetes部署

01.Helm部署
    a.Helm Chart
        a.功能说明
            使用Helm Chart简化Kubernetes部署。官方提供完整的Helm Chart,支持自定义配置。一键部署Milvus集群及所有依赖。支持滚动更新和回滚。可以方便地调整副本数和资源配置。实现配置管理和版本控制。适合生产环境大规模部署。
        b.代码示例
            ---
            # 使用Helm部署Milvus到Kubernetes
            
            # 1. 添加Milvus Helm仓库
            helm repo add milvus https://milvus-io.github.io/milvus-helm/
            helm repo update
            
            # 2. 查看可用版本
            helm search repo milvus
            
            # 3. 创建命名空间
            kubectl create namespace milvus
            
            # 4. 部署Milvus(默认配置)
            helm install milvus-release milvus/milvus --namespace milvus
            
            # 5. 自定义配置部署
            cat > values-custom.yaml <<EOF
            cluster:
              enabled: true
            
            image:
              all:
                repository: milvusdb/milvus
                tag: v2.3.0
                pullPolicy: IfNotPresent
            
            queryNode:
              replicas: 3
              resources:
                limits:
                  cpu: "4"
                  memory: "16Gi"
                requests:
                  cpu: "2"
                  memory: "8Gi"
            
            dataNode:
              replicas: 2
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
                requests:
                  cpu: "1"
                  memory: "4Gi"
            
            indexNode:
              replicas: 2
              resources:
                limits:
                  cpu: "4"
                  memory: "8Gi"
                requests:
                  cpu: "2"
                  memory: "4Gi"
            
            minio:
              mode: distributed
              replicas: 4
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            pulsar:
              enabled: true
              broker:
                replicaCount: 3
            
            etcd:
              replicaCount: 3
              resources:
                limits:
                  cpu: "1"
                  memory: "2Gi"
            
            service:
              type: LoadBalancer
              port: 19530
            EOF
            
            helm install milvus-release milvus/milvus -f values-custom.yaml --namespace milvus
            
            # 6. 查看部署状态
            kubectl get pods -n milvus
            kubectl get svc -n milvus
            
            # 7. 查看详细信息
            kubectl describe pod <pod-name> -n milvus
            
            # 8. 升级部署
            helm upgrade milvus-release milvus/milvus -f values-custom.yaml --namespace milvus
            
            # 9. 回滚到上一个版本
            helm rollback milvus-release --namespace milvus
            
            # 10. 查看发布历史
            helm history milvus-release --namespace milvus
            
            # 11. 卸载
            helm uninstall milvus-release --namespace milvus
            
            # 12. 删除命名空间
            kubectl delete namespace milvus
            ---
    b.配置优化
        a.功能说明
            根据业务需求优化Kubernetes配置。配置Pod资源请求和限制。设置节点亲和性和反亲和性。配置持久化卷,确保数据安全。实现自动扩缩容HPA。配置服务质量QoS。使用ConfigMap和Secret管理配置。实现滚动更新策略。
        b.代码示例
            ---
            # Kubernetes高级配置示例(values.yaml)
            
            # Query Node配置
            queryNode:
              replicas: 3
              resources:
                requests:
                  cpu: "2"
                  memory: "8Gi"
                limits:
                  cpu: "4"
                  memory: "16Gi"
              
              # Pod反亲和性:确保Pod分散在不同节点
              affinity:
                podAntiAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                      - key: app.kubernetes.io/name
                        operator: In
                        values:
                        - milvus
                      - key: app.kubernetes.io/component
                        operator: In
                        values:
                        - querynode
                    topologyKey: kubernetes.io/hostname
              
              # 节点亲和性:优先调度到高性能节点
              nodeAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                    matchExpressions:
                    - key: node-type
                      operator: In
                      values:
                      - high-performance
              
              # 容忍度:允许调度到特定污点的节点
              tolerations:
              - key: "milvus"
                operator: "Equal"
                value: "querynode"
                effect: "NoSchedule"
              
              # 更新策略
              strategy:
                type: RollingUpdate
                rollingUpdate:
                  maxSurge: 1
                  maxUnavailable: 0
              
              # 健康检查
              livenessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 60
                periodSeconds: 30
                timeoutSeconds: 10
                failureThreshold: 3
              
              readinessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 30
                periodSeconds: 10
                timeoutSeconds: 5
                failureThreshold: 3
            
            # HPA自动扩缩容
            autoscaling:
              enabled: true
              minReplicas: 2
              maxReplicas: 10
              targetCPUUtilizationPercentage: 70
              targetMemoryUtilizationPercentage: 80
            
            # 持久化存储
            persistence:
              enabled: true
              storageClass: "fast-ssd"
              accessMode: ReadWriteOnce
              size: 500Gi
            
            # 监控配置
            metrics:
              enabled: true
              serviceMonitor:
                enabled: true
                interval: 30s
            
            # 日志配置
            log:
              level: info
              format: json
              persistence:
                enabled: true
                size: 100Gi
            
            # 安全配置
            securityContext:
              runAsNonRoot: true
              runAsUser: 1000
              fsGroup: 1000
            ---

02.运维管理
    a.滚动更新
        a.功能说明
            Kubernetes支持滚动更新,实现零停机升级。逐个替换Pod,保证服务可用性。可以配置更新策略,控制更新速度。支持健康检查,自动回滚失败更新。可以暂停和恢复更新过程。实现灰度发布和金丝雀部署。监控更新过程,及时发现问题。
        b.代码示例
            ---
            # 滚动更新操作指南
            
            # 1. 查看当前版本
            kubectl get deployment -n milvus
            kubectl describe deployment milvus-querynode -n milvus | grep Image
            
            # 2. 更新到新版本
            helm upgrade milvus-release milvus/milvus \\
              --set image.all.tag=v2.3.1 \\
              --namespace milvus
            
            # 3. 监控更新过程
            kubectl rollout status deployment/milvus-querynode -n milvus
            
            # 4. 查看更新历史
            kubectl rollout history deployment/milvus-querynode -n milvus
            
            # 5. 暂停更新
            kubectl rollout pause deployment/milvus-querynode -n milvus
            
            # 6. 恢复更新
            kubectl rollout resume deployment/milvus-querynode -n milvus
            
            # 7. 回滚到上一个版本
            kubectl rollout undo deployment/milvus-querynode -n milvus
            
            # 8. 回滚到指定版本
            kubectl rollout undo deployment/milvus-querynode -n milvus --to-revision=2
            
            # 9. 查看Pod状态
            kubectl get pods -n milvus -w
            
            # 10. 查看事件
            kubectl get events -n milvus --sort-by='.lastTimestamp'
            
            # 灰度发布示例(使用Istio)
            # 创建VirtualService实现流量分割
            cat <<EOF | kubectl apply -f -
            apiVersion: networking.istio.io/v1beta1
            kind: VirtualService
            metadata:
              name: milvus-canary
              namespace: milvus
            spec:
              hosts:
              - milvus
              http:
              - match:
                - headers:
                    canary:
                      exact: "true"
                route:
                - destination:
                    host: milvus
                    subset: v2
                  weight: 100
              - route:
                - destination:
                    host: milvus
                    subset: v1
                  weight: 90
                - destination:
                    host: milvus
                    subset: v2
                  weight: 10
            EOF
            ---
    b.故障恢复
        a.功能说明
            Kubernetes提供自动故障恢复能力。Pod失败自动重启,保证服务可用。节点故障自动迁移Pod到健康节点。通过健康检查及时发现问题。配置重启策略,避免频繁重启。实现多副本部署,提升可用性。监控集群状态,及时处理异常。
        b.代码示例
            ---
            # Kubernetes故障恢复操作指南
            
            # 1. 查看Pod状态
            kubectl get pods -n milvus
            kubectl get pods -n milvus -o wide
            
            # 2. 查看失败的Pod
            kubectl get pods -n milvus --field-selector=status.phase!=Running
            
            # 3. 查看Pod日志
            kubectl logs <pod-name> -n milvus
            kubectl logs <pod-name> -n milvus --previous  # 查看上一次运行的日志
            kubectl logs <pod-name> -n milvus --tail=100 -f  # 实时查看最后100行
            
            # 4. 查看Pod详细信息
            kubectl describe pod <pod-name> -n milvus
            
            # 5. 查看Pod事件
            kubectl get events -n milvus --field-selector involvedObject.name=<pod-name>
            
            # 6. 进入Pod调试
            kubectl exec -it <pod-name> -n milvus -- /bin/bash
            
            # 7. 强制删除Pod(触发重建)
            kubectl delete pod <pod-name> -n milvus --force --grace-period=0
            
            # 8. 重启Deployment
            kubectl rollout restart deployment/milvus-querynode -n milvus
            
            # 9. 查看节点状态
            kubectl get nodes
            kubectl describe node <node-name>
            
            # 10. 驱逐节点上的Pod(节点维护)
            kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
            
            # 11. 恢复节点
            kubectl uncordon <node-name>
            
            # 12. 查看资源使用情况
            kubectl top nodes
            kubectl top pods -n milvus
            
            # 故障排查脚本
            cat > troubleshoot.sh <<'EOF'
            #!/bin/bash
            
            NAMESPACE="milvus"
            
            echo "========== Pod状态 =========="
            kubectl get pods -n $NAMESPACE
            
            echo ""
            echo "========== 失败的Pod =========="
            kubectl get pods -n $NAMESPACE --field-selector=status.phase!=Running
            
            echo ""
            echo "========== 最近事件 =========="
            kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20
            
            echo ""
            echo "========== 资源使用 =========="
            kubectl top pods -n $NAMESPACE
            
            echo ""
            echo "========== 节点状态 =========="
            kubectl get nodes
            
            echo ""
            echo "========== PVC状态 =========="
            kubectl get pvc -n $NAMESPACE
            
            echo ""
            echo "========== Service状态 =========="
            kubectl get svc -n $NAMESPACE
            EOF
            
            chmod +x troubleshoot.sh
            ./troubleshoot.sh
            ---

9.4 高可用配置

01.组件高可用
    a.Coordinator高可用
        a.功能说明
            Coordinator采用主备模式实现高可用。通过etcd实现Leader选举。主节点故障时自动切换到备节点。切换时间通常在秒级。需要部署多个Coordinator实例。推荐部署3个实例,保证奇数。监控Leader状态,及时发现问题。实现自动故障转移。
        b.代码示例
            ---
            # Coordinator高可用配置(Kubernetes Helm values.yaml)
            
            rootCoord:
              replicas: 3  # 部署3个实例
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
                requests:
                  cpu: "1"
                  memory: "2Gi"
              affinity:
                podAntiAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchLabels:
                        component: rootcoord
                    topologyKey: kubernetes.io/hostname
            
            dataCoord:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            queryCoord:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            indexCoord:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
            
            # etcd高可用配置
            etcd:
              replicaCount: 3  # 3节点集群
              resources:
                limits:
                  cpu: "1"
                  memory: "2Gi"
              persistence:
                enabled: true
                size: 10Gi
            
            # 监控Coordinator状态
            # kubectl get pods -n milvus | grep coord
            # kubectl logs -f <rootcoord-pod> -n milvus
            ---
    b.Worker高可用
        a.功能说明
            Worker节点通过多副本实现高可用。每个Worker类型部署多个实例。单个实例故障不影响整体服务。Query Coord自动分配任务到健康节点。支持动态扩缩容,根据负载调整。实现负载均衡,避免热点。监控Worker健康状态。自动剔除故障节点。
        b.代码示例
            ---
            # Worker高可用配置
            
            queryNode:
              replicas: 5  # 多副本部署
              resources:
                limits:
                  cpu: "4"
                  memory: "16Gi"
              # Pod反亲和性:分散到不同节点
              affinity:
                podAntiAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchLabels:
                          component: querynode
                      topologyKey: kubernetes.io/hostname
              # 健康检查
              livenessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 60
                periodSeconds: 30
                failureThreshold: 3
              readinessProbe:
                httpGet:
                  path: /healthz
                  port: 9091
                initialDelaySeconds: 30
                periodSeconds: 10
                failureThreshold: 3
            
            dataNode:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
            
            indexNode:
              replicas: 3
              resources:
                limits:
                  cpu: "4"
                  memory: "8Gi"
            
            # 测试故障转移
            # 1. 删除一个QueryNode Pod
            # kubectl delete pod <querynode-pod> -n milvus
            # 
            # 2. 观察自动重建
            # kubectl get pods -n milvus -w
            # 
            # 3. 验证服务可用
            # python3 test_connection.py
            ---

02.数据高可用
    a.存储高可用
        a.功能说明
            使用分布式存储保证数据高可用。MinIO采用分布式模式,多副本存储。etcd使用3节点集群,Raft协议保证一致性。Pulsar支持多副本,保证消息不丢失。配置持久化卷,数据持久化存储。实现定期备份,防止数据丢失。监控存储健康状态。
        b.代码示例
            ---
            # 存储高可用配置
            
            # MinIO分布式模式
            minio:
              mode: distributed
              replicas: 4  # 4节点分布式部署
              drivesPerNode: 1
              resources:
                limits:
                  cpu: "2"
                  memory: "4Gi"
              persistence:
                enabled: true
                storageClass: "fast-ssd"
                size: 500Gi
              # 纠删码配置
              erasureCodingParity: 2  # 允许2个节点故障
            
            # etcd集群
            etcd:
              replicaCount: 3
              persistence:
                enabled: true
                storageClass: "fast-ssd"
                size: 10Gi
              resources:
                limits:
                  cpu: "1"
                  memory: "2Gi"
              # 快照备份
              autoCompactionMode: revision
              autoCompactionRetention: "1000"
            
            # Pulsar集群
            pulsar:
              enabled: true
              broker:
                replicaCount: 3
                resources:
                  limits:
                    cpu: "2"
                    memory: "4Gi"
              bookkeeper:
                replicaCount: 3
                persistence:
                  enabled: true
                  size: 100Gi
              zookeeper:
                replicaCount: 3
            
            # 备份配置
            backup:
              enabled: true
              schedule: "0 2 * * *"  # 每天凌晨2点备份
              retention: 7  # 保留7天
              destination: "s3://backup-bucket/milvus"
            
            # 备份脚本示例
            cat > backup.sh <<'EOF'
            #!/bin/bash
            
            BACKUP_DIR="/backup/milvus/$(date +%Y%m%d)"
            mkdir -p $BACKUP_DIR
            
            # 备份etcd
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot save /tmp/snapshot.db
            kubectl cp milvus/etcd-0:/tmp/snapshot.db $BACKUP_DIR/etcd-snapshot.db
            
            # 备份MinIO(使用mc工具)
            mc mirror milvus-minio/milvus-bucket $BACKUP_DIR/minio-data
            
            # 上传到S3
            aws s3 sync $BACKUP_DIR s3://backup-bucket/milvus/$(date +%Y%m%d)
            
            # 清理本地备份
            find /backup/milvus -type d -mtime +7 -exec rm -rf {} \;
            EOF
            ---
    b.灾难恢复
        a.功能说明
            制定灾难恢复计划,应对极端情况。定期备份数据和配置。测试恢复流程,确保可用。实现跨区域容灾,防止区域故障。配置监控告警,及时发现问题。文档化恢复步骤,快速响应。定期演练,提升恢复能力。
        b.代码示例
            ---
            # 灾难恢复操作指南
            
            # 1. 数据恢复流程
            
            # 步骤1: 停止Milvus服务
            helm uninstall milvus-release -n milvus
            
            # 步骤2: 恢复etcd数据
            # 从备份恢复etcd快照
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot restore /backup/etcd-snapshot.db \\
              --data-dir=/var/lib/etcd-restore
            
            # 步骤3: 恢复MinIO数据
            # 从S3恢复数据到MinIO
            aws s3 sync s3://backup-bucket/milvus/20240115/minio-data milvus-minio/milvus-bucket
            
            # 步骤4: 恢复Pulsar数据
            # Pulsar数据通常不需要恢复,因为是消息队列
            
            # 步骤5: 重新部署Milvus
            helm install milvus-release milvus/milvus -f values.yaml -n milvus
            
            # 步骤6: 验证数据完整性
            python3 <<'PYTHON'
            from pymilvus import connections, Collection, utility
            
            connections.connect(host="milvus.example.com", port="19530")
            
            # 检查Collection
            collections = utility.list_collections()
            print(f"Collections: {collections}")
            
            # 检查数据量
            for coll_name in collections:
                collection = Collection(coll_name)
                count = collection.num_entities
                print(f"{coll_name}: {count} entities")
            
            connections.disconnect("default")
            PYTHON
            
            # 2. 跨区域容灾配置
            
            # 主区域配置(values-primary.yaml)
            global:
              region: us-east-1
            
            minio:
              mode: distributed
              replicas: 4
              # 配置跨区域复制
              bucketReplication:
                enabled: true
                destination: "s3://milvus-backup-us-west-1"
            
            # 备区域配置(values-secondary.yaml)
            global:
              region: us-west-1
            
            # 配置为只读模式,从主区域同步数据
            readOnly: true
            
            # 3. 故障切换流程
            
            # 检测主区域故障
            # 切换DNS到备区域
            # 将备区域切换为读写模式
            # 验证服务可用性
            
            # 4. 恢复检查清单
            cat > recovery-checklist.md <<'EOF'
            # Milvus灾难恢复检查清单
            
            ## 恢复前
            - [ ] 确认备份可用
            - [ ] 评估数据丢失范围
            - [ ] 通知相关人员
            - [ ] 准备恢复环境
            
            ## 恢复中
            - [ ] 停止现有服务
            - [ ] 恢复etcd数据
            - [ ] 恢复MinIO数据
            - [ ] 重新部署Milvus
            - [ ] 验证组件状态
            
            ## 恢复后
            - [ ] 验证数据完整性
            - [ ] 测试查询功能
            - [ ] 测试写入功能
            - [ ] 监控系统状态
            - [ ] 通知恢复完成
            - [ ] 编写事故报告
            
            ## RTO/RPO目标
            - RTO (恢复时间目标): 2小时
            - RPO (恢复点目标): 24小时
            EOF
            ---

9.5 扩容缩容

01.手动扩缩容
    a.Worker扩容
        a.功能说明
            根据负载手动扩展Worker节点数量。Query Node扩容提升查询并发能力。Data Node扩容提升写入吞吐量。Index Node扩容加快索引构建速度。通过Helm或kubectl调整副本数。扩容后自动加入集群,无需重启。监控资源使用情况,及时扩容。
        b.代码示例
            ---
            # Worker节点手动扩容
            
            # 方法1: 使用Helm升级
            helm upgrade milvus-release milvus/milvus \\
              --set queryNode.replicas=5 \\
              --set dataNode.replicas=3 \\
              --set indexNode.replicas=3 \\
              --namespace milvus
            
            # 方法2: 使用kubectl scale
            kubectl scale deployment milvus-querynode --replicas=5 -n milvus
            kubectl scale deployment milvus-datanode --replicas=3 -n milvus
            kubectl scale deployment milvus-indexnode --replicas=3 -n milvus
            
            # 方法3: 修改values.yaml后重新部署
            cat > values-scale.yaml <<EOF
            queryNode:
              replicas: 5
              resources:
                limits:
                  cpu: "4"
                  memory: "16Gi"
            
            dataNode:
              replicas: 3
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
            
            indexNode:
              replicas: 3
              resources:
                limits:
                  cpu: "4"
                  memory: "8Gi"
            EOF
            
            helm upgrade milvus-release milvus/milvus -f values-scale.yaml -n milvus
            
            # 验证扩容结果
            kubectl get pods -n milvus | grep -E "querynode|datanode|indexnode"
            
            # 监控新节点状态
            kubectl get pods -n milvus -w
            
            # 查看负载分布
            kubectl top pods -n milvus
            
            # 扩容建议:
            # - Query Node: 根据QPS需求扩容,每个节点支持1000-5000 QPS
            # - Data Node: 根据写入吞吐量扩容,每个节点支持10000-50000 vectors/s
            # - Index Node: 根据索引构建速度扩容,并行构建加快速度
            ---
    b.Worker缩容
        a.功能说明
            负载降低时缩减Worker节点数量,节省资源。缩容前确保有足够的剩余容量。Kubernetes会优雅地终止Pod。Query Node会先停止接收新请求,完成现有请求后退出。需要监控缩容后的系统负载。避免过度缩容导致性能下降。
        b.代码示例
            ---
            # Worker节点手动缩容
            
            # 缩容前检查当前负载
            kubectl top pods -n milvus
            
            # 查看当前副本数
            kubectl get deployment -n milvus
            
            # 缩容Query Node
            kubectl scale deployment milvus-querynode --replicas=3 -n milvus
            
            # 缩容Data Node
            kubectl scale deployment milvus-datanode --replicas=2 -n milvus
            
            # 缩容Index Node
            kubectl scale deployment milvus-indexnode --replicas=2 -n milvus
            
            # 或使用Helm
            helm upgrade milvus-release milvus/milvus \\
              --set queryNode.replicas=3 \\
              --set dataNode.replicas=2 \\
              --set indexNode.replicas=2 \\
              --namespace milvus
            
            # 监控缩容过程
            kubectl get pods -n milvus -w
            
            # 验证服务可用性
            python3 <<'PYTHON'
            from pymilvus import connections, Collection
            import numpy as np
            import time
            
            connections.connect(host="milvus.example.com", port="19530")
            
            collection = Collection("test_collection")
            
            # 测试查询
            query_vector = [[np.random.random() for _ in range(128)]]
            
            for i in range(10):
                start = time.time()
                results = collection.search(
                    data=query_vector,
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10
                )
                latency = time.time() - start
                print(f"查询{i+1}: {latency*1000:.2f}ms")
            
            connections.disconnect("default")
            PYTHON
            
            # 缩容注意事项:
            # - 确保剩余容量足够
            # - 监控缩容后的性能
            # - 避免频繁缩容
            # - 保留最小副本数(至少2个)
            ---

02.自动扩缩容
    a.HPA配置
        a.功能说明
            Horizontal Pod Autoscaler根据指标自动扩缩容。支持基于CPU、内存、自定义指标扩缩容。设置最小和最大副本数。配置目标利用率阈值。自动调整副本数,无需人工干预。适合负载波动较大的场景。需要配置metrics-server。
        b.代码示例
            ---
            # HPA自动扩缩容配置
            
            # 1. 确保metrics-server已安装
            kubectl get deployment metrics-server -n kube-system
            
            # 2. 在values.yaml中启用HPA
            queryNode:
              replicas: 3
              resources:
                requests:
                  cpu: "2"
                  memory: "8Gi"
                limits:
                  cpu: "4"
                  memory: "16Gi"
              autoscaling:
                enabled: true
                minReplicas: 2
                maxReplicas: 10
                targetCPUUtilizationPercentage: 70
                targetMemoryUtilizationPercentage: 80
            
            dataNode:
              replicas: 2
              autoscaling:
                enabled: true
                minReplicas: 2
                maxReplicas: 5
                targetCPUUtilizationPercentage: 70
            
            indexNode:
              replicas: 2
              autoscaling:
                enabled: true
                minReplicas: 2
                maxReplicas: 5
                targetCPUUtilizationPercentage: 80
            
            # 3. 部署或更新
            helm upgrade milvus-release milvus/milvus -f values.yaml -n milvus
            
            # 4. 查看HPA状态
            kubectl get hpa -n milvus
            
            # 5. 查看HPA详细信息
            kubectl describe hpa milvus-querynode -n milvus
            
            # 6. 手动创建HPA(如果Helm不支持)
            cat <<EOF | kubectl apply -f -
            apiVersion: autoscaling/v2
            kind: HorizontalPodAutoscaler
            metadata:
              name: milvus-querynode-hpa
              namespace: milvus
            spec:
              scaleTargetRef:
                apiVersion: apps/v1
                kind: Deployment
                name: milvus-querynode
              minReplicas: 2
              maxReplicas: 10
              metrics:
              - type: Resource
                resource:
                  name: cpu
                  target:
                    type: Utilization
                    averageUtilization: 70
              - type: Resource
                resource:
                  name: memory
                  target:
                    type: Utilization
                    averageUtilization: 80
              behavior:
                scaleDown:
                  stabilizationWindowSeconds: 300
                  policies:
                  - type: Percent
                    value: 50
                    periodSeconds: 60
                scaleUp:
                  stabilizationWindowSeconds: 60
                  policies:
                  - type: Percent
                    value: 100
                    periodSeconds: 60
                  - type: Pods
                    value: 2
                    periodSeconds: 60
                  selectPolicy: Max
            EOF
            
            # 7. 监控HPA行为
            kubectl get hpa -n milvus -w
            
            # 8. 查看扩缩容事件
            kubectl get events -n milvus | grep -i "scaled"
            
            # HPA配置说明:
            # - minReplicas: 最小副本数
            # - maxReplicas: 最大副本数
            # - targetCPUUtilizationPercentage: CPU目标利用率
            # - targetMemoryUtilizationPercentage: 内存目标利用率
            # - stabilizationWindowSeconds: 稳定窗口,避免频繁扩缩容
            # - scaleDown/scaleUp policies: 扩缩容策略
            ---
    b.自定义指标
        a.功能说明
            除了CPU和内存,还可以基于自定义指标扩缩容。如QPS、查询延迟、队列长度等业务指标。需要安装Prometheus和Prometheus Adapter。定义自定义指标的计算规则。HPA根据自定义指标自动扩缩容。更贴近业务需求,扩缩容更精准。
        b.代码示例
            ---
            # 基于自定义指标的HPA配置
            
            # 1. 安装Prometheus和Prometheus Adapter
            helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
            helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
            
            helm repo add prometheus-adapter https://prometheus-community.github.io/helm-charts
            helm install prometheus-adapter prometheus-adapter/prometheus-adapter -n monitoring
            
            # 2. 配置Prometheus Adapter自定义指标
            cat > prometheus-adapter-values.yaml <<EOF
            rules:
              custom:
              - seriesQuery: 'milvus_query_qps{namespace="milvus"}'
                resources:
                  overrides:
                    namespace: {resource: "namespace"}
                    pod: {resource: "pod"}
                name:
                  matches: "^(.*)_qps"
                  as: "milvus_query_qps"
                metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
              
              - seriesQuery: 'milvus_query_latency_ms{namespace="milvus"}'
                resources:
                  overrides:
                    namespace: {resource: "namespace"}
                    pod: {resource: "pod"}
                name:
                  matches: "^(.*)_latency_ms"
                  as: "milvus_query_latency"
                metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
            EOF
            
            helm upgrade prometheus-adapter prometheus-adapter/prometheus-adapter \\
              -f prometheus-adapter-values.yaml -n monitoring
            
            # 3. 验证自定义指标
            kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
            
            # 4. 创建基于自定义指标的HPA
            cat <<EOF | kubectl apply -f -
            apiVersion: autoscaling/v2
            kind: HorizontalPodAutoscaler
            metadata:
              name: milvus-querynode-custom-hpa
              namespace: milvus
            spec:
              scaleTargetRef:
                apiVersion: apps/v1
                kind: Deployment
                name: milvus-querynode
              minReplicas: 2
              maxReplicas: 10
              metrics:
              # 基于QPS扩缩容
              - type: Pods
                pods:
                  metric:
                    name: milvus_query_qps
                  target:
                    type: AverageValue
                    averageValue: "1000"  # 每个Pod处理1000 QPS
              # 基于查询延迟扩缩容
              - type: Pods
                pods:
                  metric:
                    name: milvus_query_latency
                  target:
                    type: AverageValue
                    averageValue: "50"  # 平均延迟50ms
              behavior:
                scaleDown:
                  stabilizationWindowSeconds: 300
                scaleUp:
                  stabilizationWindowSeconds: 60
            EOF
            
            # 5. 监控自定义指标HPA
            kubectl get hpa milvus-querynode-custom-hpa -n milvus -w
            
            # 6. 查看指标值
            kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/milvus/pods/*/milvus_query_qps" | jq .
            
            # 自定义指标示例:
            # - QPS: 每秒查询数
            # - 查询延迟: 平均查询延迟
            # - 队列长度: 待处理请求队列长度
            # - 错误率: 查询错误率
            # - 资源使用率: GPU使用率等
            
            # 测试自动扩缩容
            python3 <<'PYTHON'
            from pymilvus import connections, Collection
            import numpy as np
            import time
            import threading
            
            connections.connect(host="milvus.example.com", port="19530")
            collection = Collection("test_collection")
            
            def query_worker():
                """持续查询,触发扩容"""
                query_vector = [[np.random.random() for _ in range(128)]]
                
                while True:
                    try:
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": 16}},
                            limit=10
                        )
                    except:
                        pass
                    time.sleep(0.001)  # 高频查询
            
            # 启动多个线程模拟高负载
            threads = []
            for i in range(50):
                t = threading.Thread(target=query_worker, daemon=True)
                t.start()
                threads.append(t)
            
            print("高负载测试运行中,观察HPA扩容...")
            print("kubectl get hpa -n milvus -w")
            
            time.sleep(300)  # 运行5分钟
            PYTHON
            ---

10 AI框架集成

10.1 LangChain集成

01.基础集成
    a.安装配置
        a.功能说明
            LangChain是流行的LLM应用开发框架。Milvus作为向量存储后端与LangChain无缝集成。支持文档加载、分割、嵌入、检索等完整流程。提供MilvusVectorStore类封装Milvus操作。支持相似度搜索和MMR检索。可以与LLM结合实现RAG应用。安装langchain和pymilvus即可使用。
        b.代码示例
            ---
            # 安装依赖
            # pip install langchain langchain-community pymilvus openai
            
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.text_splitter import RecursiveCharacterTextSplitter
            from langchain_community.document_loaders import TextLoader
            
            # 1. 加载文档
            loader = TextLoader("document.txt")
            documents = loader.load()
            
            # 2. 分割文档
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=1000,
                chunk_overlap=200
            )
            docs = text_splitter.split_documents(documents)
            
            # 3. 创建嵌入模型
            embeddings = OpenAIEmbeddings()
            
            # 4. 创建Milvus向量存储
            vector_store = Milvus.from_documents(
                docs,
                embeddings,
                connection_args={
                    "host": "localhost",
                    "port": "19530"
                },
                collection_name="langchain_docs",
                drop_old=True
            )
            
            # 5. 相似度搜索
            query = "What is machine learning?"
            results = vector_store.similarity_search(query, k=3)
            
            for i, doc in enumerate(results):
                print(f"\n结果 {i+1}:")
                print(f"内容: {doc.page_content[:200]}...")
                print(f"元数据: {doc.metadata}")
            
            # 6. 带分数的搜索
            results_with_scores = vector_store.similarity_search_with_score(query, k=3)
            
            for doc, score in results_with_scores:
                print(f"\n分数: {score}")
                print(f"内容: {doc.page_content[:200]}...")
            
            # 7. MMR检索(最大边际相关性)
            mmr_results = vector_store.max_marginal_relevance_search(
                query,
                k=3,
                fetch_k=10
            )
            
            print(f"\nMMR检索结果: {len(mmr_results)}个")
            ---
    b.检索器配置
        a.功能说明
            LangChain提供Retriever抽象,统一检索接口。Milvus可以转换为Retriever使用。支持多种检索模式:相似度、MMR、阈值过滤。可以配置检索参数,如top-k、score阈值。Retriever可以与LLM链式组合。实现问答、摘要等应用。支持自定义检索逻辑。
        b.代码示例
            ---
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.chains import RetrievalQA
            from langchain_community.llms import OpenAI
            
            # 创建向量存储
            embeddings = OpenAIEmbeddings()
            vector_store = Milvus(
                embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="langchain_docs"
            )
            
            # 1. 转换为Retriever(相似度模式)
            retriever = vector_store.as_retriever(
                search_type="similarity",
                search_kwargs={"k": 3}
            )
            
            # 测试检索
            docs = retriever.get_relevant_documents("What is deep learning?")
            print(f"检索到 {len(docs)} 个文档")
            
            # 2. MMR模式Retriever
            mmr_retriever = vector_store.as_retriever(
                search_type="mmr",
                search_kwargs={
                    "k": 3,
                    "fetch_k": 10,
                    "lambda_mult": 0.5
                }
            )
            
            # 3. 阈值过滤Retriever
            threshold_retriever = vector_store.as_retriever(
                search_type="similarity_score_threshold",
                search_kwargs={
                    "score_threshold": 0.8,
                    "k": 5
                }
            )
            
            # 4. 与LLM结合使用
            llm = OpenAI(temperature=0)
            
            qa_chain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=retriever,
                return_source_documents=True
            )
            
            # 执行问答
            query = "What are the main types of machine learning?"
            result = qa_chain({"query": query})
            
            print(f"\n问题: {query}")
            print(f"\n答案: {result['result']}")
            print(f"\n来源文档数: {len(result['source_documents'])}")
            
            for i, doc in enumerate(result['source_documents']):
                print(f"\n来源 {i+1}:")
                print(doc.page_content[:200])
            ---

02.RAG应用
    a.问答系统
        a.功能说明
            基于检索增强生成RAG构建问答系统。Milvus存储知识库向量。用户提问时检索相关文档。将文档作为上下文传给LLM生成答案。支持多种链类型:stuff、map_reduce、refine。可以自定义提示词模板。实现引用来源,提升可信度。支持流式输出。
        b.代码示例
            ---
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.chains import RetrievalQA
            from langchain_community.llms import OpenAI
            from langchain.prompts import PromptTemplate
            
            # 创建向量存储
            embeddings = OpenAIEmbeddings()
            vector_store = Milvus(
                embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="knowledge_base"
            )
            
            # 自定义提示词模板
            prompt_template = """使用以下上下文回答问题。如果不知道答案,就说不知道,不要编造答案。
            
            上下文:
            {context}
            
            问题: {question}
            
            答案:"""
            
            PROMPT = PromptTemplate(
                template=prompt_template,
                input_variables=["context", "question"]
            )
            
            # 创建QA链
            llm = OpenAI(temperature=0)
            qa_chain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=vector_store.as_retriever(search_kwargs={"k": 3}),
                return_source_documents=True,
                chain_type_kwargs={"prompt": PROMPT}
            )
            
            # 问答示例
            questions = [
                "What is the capital of France?",
                "Explain quantum computing in simple terms.",
                "What are the benefits of exercise?"
            ]
            
            for query in questions:
                result = qa_chain({"query": query})
                
                print(f"\n{'='*60}")
                print(f"问题: {query}")
                print(f"\n答案: {result['result']}")
                
                print(f"\n参考来源:")
                for i, doc in enumerate(result['source_documents']):
                    print(f"\n[{i+1}] {doc.metadata.get('source', 'Unknown')}")
                    print(f"    {doc.page_content[:150]}...")
            
            # 使用map_reduce处理长文档
            qa_chain_mr = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="map_reduce",
                retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
                return_source_documents=True
            )
            
            result = qa_chain_mr({"query": "Summarize the main points about AI safety."})
            print(f"\n摘要: {result['result']}")
            ---
    b.对话系统
        a.功能说明
            构建带记忆的对话系统。使用ConversationalRetrievalChain实现多轮对话。Milvus存储知识库,LLM生成回复。支持对话历史管理。可以根据历史优化检索。实现上下文感知的回答。支持流式对话。可以集成聊天界面。
        b.代码示例
            ---
            from langchain_community.vectorstores import Milvus
            from langchain_community.embeddings import OpenAIEmbeddings
            from langchain.chains import ConversationalRetrievalChain
            from langchain_community.llms import OpenAI
            from langchain.memory import ConversationBufferMemory
            
            # 创建向量存储
            embeddings = OpenAIEmbeddings()
            vector_store = Milvus(
                embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="chat_knowledge"
            )
            
            # 创建对话记忆
            memory = ConversationBufferMemory(
                memory_key="chat_history",
                return_messages=True,
                output_key="answer"
            )
            
            # 创建对话链
            llm = OpenAI(temperature=0.7)
            conversation_chain = ConversationalRetrievalChain.from_llm(
                llm=llm,
                retriever=vector_store.as_retriever(search_kwargs={"k": 3}),
                memory=memory,
                return_source_documents=True
            )
            
            # 多轮对话示例
            print("对话系统启动(输入'quit'退出)\n")
            
            while True:
                query = input("用户: ")
                if query.lower() == 'quit':
                    break
                
                result = conversation_chain({"question": query})
                
                print(f"\n助手: {result['answer']}\n")
                
                if result.get('source_documents'):
                    print("参考来源:")
                    for i, doc in enumerate(result['source_documents'][:2]):
                        print(f"  [{i+1}] {doc.page_content[:100]}...")
                    print()
            
            # 对话示例脚本
            demo_questions = [
                "What is machine learning?",
                "Can you give me an example?",
                "How does it differ from traditional programming?",
                "What are some applications?"
            ]
            
            print("\n对话演示:\n")
            for query in demo_questions:
                result = conversation_chain({"question": query})
                print(f"用户: {query}")
                print(f"助手: {result['answer']}\n")
            
            # 查看对话历史
            print("\n对话历史:")
            print(memory.load_memory_variables({}))
            ---

10.2 LlamaIndex集成

01.索引构建
    a.向量索引
        a.功能说明
            LlamaIndex(原GPT Index)是数据框架,用于LLM应用。Milvus作为向量存储后端与LlamaIndex集成。支持构建向量索引,存储文档嵌入。提供MilvusVectorStore类封装操作。支持文档加载、索引、查询完整流程。可以与多种LLM配合使用。实现高效的文档检索和问答。
        b.代码示例
            ---
            # 安装依赖
            # pip install llama-index llama-index-vector-stores-milvus pymilvus
            
            from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.core import Settings
            from llama_index.embeddings.openai import OpenAIEmbedding
            from llama_index.llms.openai import OpenAI
            
            # 1. 配置全局设置
            Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
            Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
            
            # 2. 加载文档
            documents = SimpleDirectoryReader("./data").load_data()
            print(f"加载了 {len(documents)} 个文档")
            
            # 3. 创建Milvus向量存储
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                dim=1536,  # OpenAI embedding维度
                collection_name="llamaindex_docs",
                overwrite=True
            )
            
            # 4. 创建存储上下文
            storage_context = StorageContext.from_defaults(
                vector_store=vector_store
            )
            
            # 5. 构建索引
            index = VectorStoreIndex.from_documents(
                documents,
                storage_context=storage_context,
                show_progress=True
            )
            
            print("索引构建完成!")
            
            # 6. 查询索引
            query_engine = index.as_query_engine(
                similarity_top_k=3
            )
            
            response = query_engine.query("What is the main topic of these documents?")
            print(f"\n查询: What is the main topic of these documents?")
            print(f"回答: {response}")
            
            # 7. 流式查询
            streaming_response = query_engine.query("Explain the key concepts.")
            for text in streaming_response.response_gen:
                print(text, end="", flush=True)
            print()
            
            # 8. 加载已有索引
            # 后续使用时无需重新构建
            vector_store_existing = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="llamaindex_docs"
            )
            
            storage_context_existing = StorageContext.from_defaults(
                vector_store=vector_store_existing
            )
            
            index_loaded = VectorStoreIndex.from_vector_store(
                vector_store_existing,
                storage_context=storage_context_existing
            )
            
            query_engine_loaded = index_loaded.as_query_engine()
            response = query_engine_loaded.query("Summarize the content.")
            print(f"\n从已有索引查询: {response}")
            ---
    b.混合索引
        a.功能说明
            LlamaIndex支持多种索引类型组合。可以结合向量索引和关键词索引。实现混合检索,提升准确率。支持自定义检索策略。可以配置不同索引的权重。实现多模态检索。支持图索引、树索引等高级结构。
        b.代码示例
            ---
            from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.core.indices.composability import ComposableGraph
            from llama_index.core import SummaryIndex
            from llama_index.core.tools import QueryEngineTool
            from llama_index.core.query_engine import RouterQueryEngine
            from llama_index.core.selectors import LLMSingleSelector
            
            # 加载文档
            documents = SimpleDirectoryReader("./data").load_data()
            
            # 1. 创建向量索引
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="hybrid_index",
                dim=1536
            )
            
            storage_context = StorageContext.from_defaults(
                vector_store=vector_store
            )
            
            vector_index = VectorStoreIndex.from_documents(
                documents,
                storage_context=storage_context
            )
            
            # 2. 创建摘要索引
            summary_index = SummaryIndex.from_documents(documents)
            
            # 3. 创建查询引擎工具
            vector_tool = QueryEngineTool.from_defaults(
                query_engine=vector_index.as_query_engine(),
                description="用于回答关于文档具体细节的问题"
            )
            
            summary_tool = QueryEngineTool.from_defaults(
                query_engine=summary_index.as_query_engine(),
                description="用于回答需要整体理解文档的问题"
            )
            
            # 4. 创建路由查询引擎
            router_query_engine = RouterQueryEngine(
                selector=LLMSingleSelector.from_defaults(),
                query_engine_tools=[vector_tool, summary_tool]
            )
            
            # 5. 使用路由查询
            response1 = router_query_engine.query(
                "What is the specific definition of machine learning mentioned in the document?"
            )
            print(f"细节问题: {response1}")
            
            response2 = router_query_engine.query(
                "What is the overall theme of these documents?"
            )
            print(f"整体问题: {response2}")
            
            # 6. 自定义混合检索
            from llama_index.core.retrievers import VectorIndexRetriever
            from llama_index.core.query_engine import RetrieverQueryEngine
            
            retriever = VectorIndexRetriever(
                index=vector_index,
                similarity_top_k=5
            )
            
            query_engine = RetrieverQueryEngine.from_args(
                retriever=retriever,
                response_mode="tree_summarize"
            )
            
            response = query_engine.query("Explain the main concepts.")
            print(f"混合检索结果: {response}")
            ---

02.查询优化
    a.高级查询
        a.功能说明
            LlamaIndex提供多种高级查询模式。支持子问题查询,分解复杂问题。实现多步推理,逐步求解。支持假设性文档嵌入HyDE。可以配置响应合成模式。实现引用追踪,提供来源。支持流式响应。可以自定义查询转换。
        b.代码示例
            ---
            from llama_index.core import VectorStoreIndex
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.core.query_engine import SubQuestionQueryEngine
            from llama_index.core.tools import QueryEngineTool, ToolMetadata
            from llama_index.core.response.notebook_utils import display_response
            
            # 加载索引
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="advanced_query"
            )
            
            index = VectorStoreIndex.from_vector_store(vector_store)
            
            # 1. 子问题查询引擎
            # 将复杂问题分解为多个子问题
            query_engine_tools = [
                QueryEngineTool(
                    query_engine=index.as_query_engine(),
                    metadata=ToolMetadata(
                        name="document_index",
                        description="包含文档的详细信息"
                    )
                )
            ]
            
            sub_question_engine = SubQuestionQueryEngine.from_defaults(
                query_engine_tools=query_engine_tools
            )
            
            response = sub_question_engine.query(
                "Compare and contrast the advantages and disadvantages of different machine learning approaches."
            )
            print(f"子问题查询: {response}")
            
            # 2. 配置响应模式
            # compact: 紧凑模式,合并文本块
            query_engine_compact = index.as_query_engine(
                response_mode="compact",
                similarity_top_k=5
            )
            
            # tree_summarize: 树形摘要,层次化处理
            query_engine_tree = index.as_query_engine(
                response_mode="tree_summarize",
                similarity_top_k=5
            )
            
            # refine: 精炼模式,迭代优化答案
            query_engine_refine = index.as_query_engine(
                response_mode="refine",
                similarity_top_k=5
            )
            
            query = "What are the key principles of effective learning?"
            
            response_compact = query_engine_compact.query(query)
            response_tree = query_engine_tree.query(query)
            response_refine = query_engine_refine.query(query)
            
            print(f"\nCompact模式: {response_compact}")
            print(f"\nTree模式: {response_tree}")
            print(f"\nRefine模式: {response_refine}")
            
            # 3. 流式响应
            streaming_engine = index.as_query_engine(
                streaming=True
            )
            
            streaming_response = streaming_engine.query("Explain neural networks.")
            print("\n流式响应:")
            for text in streaming_response.response_gen:
                print(text, end="", flush=True)
            print()
            
            # 4. 带元数据过滤的查询
            from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
            
            filters = MetadataFilters(
                filters=[
                    ExactMatchFilter(key="category", value="machine_learning")
                ]
            )
            
            filtered_engine = index.as_query_engine(
                filters=filters,
                similarity_top_k=3
            )
            
            response = filtered_engine.query("What is supervised learning?")
            print(f"\n过滤查询: {response}")
            
            # 5. 查看来源节点
            response_with_sources = index.as_query_engine(
                response_mode="compact"
            ).query("What is deep learning?")
            
            print(f"\n回答: {response_with_sources}")
            print(f"\n来源节点:")
            for i, node in enumerate(response_with_sources.source_nodes):
                print(f"\n[{i+1}] 分数: {node.score:.4f}")
                print(f"    内容: {node.text[:200]}...")
                print(f"    元数据: {node.metadata}")
            ---
    b.Agent应用
        a.功能说明
            LlamaIndex支持构建Agent应用。Agent可以使用多种工具完成任务。Milvus作为知识库工具之一。Agent根据问题选择合适的工具。实现多步推理和规划。支持工具组合使用。可以自定义工具和策略。实现复杂的AI应用。
        b.代码示例
            ---
            from llama_index.core.agent import ReActAgent
            from llama_index.core.tools import QueryEngineTool, ToolMetadata, FunctionTool
            from llama_index.core import VectorStoreIndex
            from llama_index.vector_stores.milvus import MilvusVectorStore
            from llama_index.llms.openai import OpenAI
            
            # 1. 创建知识库工具
            vector_store = MilvusVectorStore(
                host="localhost",
                port=19530,
                collection_name="agent_knowledge"
            )
            
            index = VectorStoreIndex.from_vector_store(vector_store)
            
            knowledge_tool = QueryEngineTool(
                query_engine=index.as_query_engine(),
                metadata=ToolMetadata(
                    name="knowledge_base",
                    description="包含公司文档、产品信息、技术文档的知识库"
                )
            )
            
            # 2. 创建自定义函数工具
            def calculate(expression: str) -> str:
                """计算数学表达式"""
                try:
                    result = eval(expression)
                    return f"计算结果: {result}"
                except:
                    return "计算错误"
            
            calc_tool = FunctionTool.from_defaults(fn=calculate)
            
            def search_web(query: str) -> str:
                """搜索网络信息"""
                # 实际应用中调用搜索API
                return f"网络搜索结果: {query}"
            
            web_tool = FunctionTool.from_defaults(fn=search_web)
            
            # 3. 创建ReAct Agent
            llm = OpenAI(model="gpt-4", temperature=0)
            
            agent = ReActAgent.from_tools(
                tools=[knowledge_tool, calc_tool, web_tool],
                llm=llm,
                verbose=True
            )
            
            # 4. 使用Agent
            response1 = agent.chat("What is our company's return policy?")
            print(f"Agent回答: {response1}")
            
            response2 = agent.chat("Calculate 15% discount on $299")
            print(f"Agent回答: {response2}")
            
            response3 = agent.chat(
                "Find information about the latest AI trends and compare with our product features"
            )
            print(f"Agent回答: {response3}")
            
            # 5. 多轮对话
            print("\nAgent对话模式(输入'quit'退出):")
            
            while True:
                user_input = input("\n用户: ")
                if user_input.lower() == 'quit':
                    break
                
                response = agent.chat(user_input)
                print(f"Agent: {response}")
            
            # 6. 查看Agent推理过程
            response_with_reasoning = agent.chat(
                "What are the key features of our product and how much would it cost with a 20% discount?"
            )
            
            print(f"\n最终回答: {response_with_reasoning}")
            print(f"\n推理步骤:")
            for step in agent.chat_history:
                print(f"  - {step}")
            ---

10.3 Haystack集成

01.Pipeline构建
    a.文档处理
        a.功能说明
            Haystack是端到端NLP框架,用于构建搜索和问答系统。Milvus作为文档存储后端与Haystack集成。支持文档索引、检索、问答完整流程。提供MilvusDocumentStore类封装操作。支持Pipeline模式,组合多个组件。可以与多种Reader和Retriever配合。实现生产级NLP应用。
        b.代码示例
            ---
            # 安装依赖
            # pip install farm-haystack[milvus] pymilvus
            
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import PreProcessor, EmbeddingRetriever
            from haystack.utils import convert_files_to_docs
            
            # 1. 创建Milvus文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="haystack_docs",
                embedding_dim=768,
                similarity="cosine",
                recreate_index=True
            )
            
            # 2. 加载文档
            docs = convert_files_to_docs(
                dir_path="./data",
                clean_func=None,
                split_paragraphs=True
            )
            
            print(f"加载了 {len(docs)} 个文档")
            
            # 3. 预处理文档
            preprocessor = PreProcessor(
                clean_empty_lines=True,
                clean_whitespace=True,
                clean_header_footer=True,
                split_by="word",
                split_length=200,
                split_overlap=20,
                split_respect_sentence_boundary=True
            )
            
            processed_docs = preprocessor.process(docs)
            print(f"预处理后: {len(processed_docs)} 个文档片段")
            
            # 4. 写入文档存储
            document_store.write_documents(processed_docs)
            print("文档已写入Milvus")
            
            # 5. 创建嵌入检索器
            retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2",
                model_format="sentence_transformers"
            )
            
            # 6. 更新文档嵌入
            document_store.update_embeddings(retriever)
            print("文档嵌入已更新")
            
            # 7. 检索文档
            query = "What is machine learning?"
            retrieved_docs = retriever.retrieve(
                query=query,
                top_k=3
            )
            
            print(f"\n查询: {query}")
            print(f"检索到 {len(retrieved_docs)} 个文档:\n")
            
            for i, doc in enumerate(retrieved_docs):
                print(f"[{i+1}] 分数: {doc.score:.4f}")
                print(f"    内容: {doc.content[:200]}...")
                print(f"    元数据: {doc.meta}\n")
            ---
    b.Pipeline组装
        a.功能说明
            Haystack使用Pipeline模式组装NLP应用。Pipeline由多个节点组成,数据在节点间流动。支持检索、阅读、生成等多种节点。可以自定义节点和连接方式。实现复杂的处理流程。支持并行和条件分支。可以保存和加载Pipeline。
        b.代码示例
            ---
            from haystack import Pipeline
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import EmbeddingRetriever, FARMReader, PromptNode
            from haystack.nodes import AnswerParser, PromptTemplate
            
            # 1. 创建文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="haystack_pipeline",
                embedding_dim=768
            )
            
            # 2. 创建检索器
            retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2"
            )
            
            # 3. 创建阅读器
            reader = FARMReader(
                model_name_or_path="deepset/roberta-base-squad2",
                use_gpu=True
            )
            
            # 4. 构建检索式问答Pipeline
            retrieval_qa_pipeline = Pipeline()
            retrieval_qa_pipeline.add_node(
                component=retriever,
                name="Retriever",
                inputs=["Query"]
            )
            retrieval_qa_pipeline.add_node(
                component=reader,
                name="Reader",
                inputs=["Retriever"]
            )
            
            # 5. 运行Pipeline
            query = "What are the main types of machine learning?"
            
            result = retrieval_qa_pipeline.run(
                query=query,
                params={
                    "Retriever": {"top_k": 5},
                    "Reader": {"top_k": 3}
                }
            )
            
            print(f"问题: {query}\n")
            print("答案:")
            for i, answer in enumerate(result["answers"]):
                print(f"\n[{i+1}] 答案: {answer.answer}")
                print(f"    分数: {answer.score:.4f}")
                print(f"    上下文: {answer.context[:200]}...")
            
            # 6. 构建生成式问答Pipeline(使用LLM)
            prompt_template = PromptTemplate(
                prompt="""根据以下上下文回答问题。
                
                上下文: {join(documents)}
                
                问题: {query}
                
                答案:""",
                output_parser=AnswerParser()
            )
            
            prompt_node = PromptNode(
                model_name_or_path="gpt-3.5-turbo",
                api_key="your-api-key",
                default_prompt_template=prompt_template
            )
            
            generative_qa_pipeline = Pipeline()
            generative_qa_pipeline.add_node(
                component=retriever,
                name="Retriever",
                inputs=["Query"]
            )
            generative_qa_pipeline.add_node(
                component=prompt_node,
                name="PromptNode",
                inputs=["Retriever"]
            )
            
            # 7. 运行生成式Pipeline
            result_gen = generative_qa_pipeline.run(
                query=query,
                params={"Retriever": {"top_k": 3}}
            )
            
            print(f"\n生成式答案: {result_gen['answers'][0].answer}")
            
            # 8. 保存和加载Pipeline
            retrieval_qa_pipeline.save_to_yaml("qa_pipeline.yaml")
            
            # 加载Pipeline
            loaded_pipeline = Pipeline.load_from_yaml("qa_pipeline.yaml")
            
            # 9. 批量查询
            queries = [
                "What is supervised learning?",
                "Explain neural networks.",
                "What is the difference between AI and ML?"
            ]
            
            for q in queries:
                result = retrieval_qa_pipeline.run(
                    query=q,
                    params={"Retriever": {"top_k": 3}, "Reader": {"top_k": 1}}
                )
                print(f"\n问题: {q}")
                print(f"答案: {result['answers'][0].answer if result['answers'] else '未找到答案'}")
            ---

02.高级应用
    a.多模态检索
        a.功能说明
            Haystack支持多模态文档处理。可以处理文本、表格、图片等多种格式。Milvus存储多模态嵌入。支持跨模态检索。可以提取PDF、Word等文件内容。实现文档理解和问答。支持OCR和图像理解。构建企业级文档搜索系统。
        b.代码示例
            ---
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import (
                PDFToTextConverter,
                PreProcessor,
                EmbeddingRetriever,
                TableTextRetriever
            )
            from haystack import Pipeline
            
            # 1. 创建文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="multimodal_docs",
                embedding_dim=768
            )
            
            # 2. 创建PDF转换器
            pdf_converter = PDFToTextConverter(
                remove_numeric_tables=False,
                valid_languages=["en", "zh"]
            )
            
            # 3. 转换PDF文档
            pdf_docs = pdf_converter.convert(
                file_path="document.pdf",
                meta={"source": "document.pdf"}
            )
            
            # 4. 预处理
            preprocessor = PreProcessor(
                split_by="word",
                split_length=200,
                split_overlap=20
            )
            
            processed_docs = preprocessor.process(pdf_docs)
            
            # 5. 写入文档存储
            document_store.write_documents(processed_docs)
            
            # 6. 创建检索器
            retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2"
            )
            
            document_store.update_embeddings(retriever)
            
            # 7. 表格检索
            table_retriever = TableTextRetriever(
                document_store=document_store,
                embedding_model="deepset/all-mpnet-base-v2-table"
            )
            
            # 8. 构建多模态检索Pipeline
            multimodal_pipeline = Pipeline()
            multimodal_pipeline.add_node(
                component=retriever,
                name="TextRetriever",
                inputs=["Query"]
            )
            multimodal_pipeline.add_node(
                component=table_retriever,
                name="TableRetriever",
                inputs=["Query"]
            )
            
            # 9. 查询
            query = "What are the sales figures for Q3?"
            
            result = multimodal_pipeline.run(
                query=query,
                params={
                    "TextRetriever": {"top_k": 3},
                    "TableRetriever": {"top_k": 2}
                }
            )
            
            print(f"查询: {query}\n")
            print("文本结果:")
            for doc in result.get("documents", []):
                if doc.content_type == "text":
                    print(f"  - {doc.content[:200]}...")
            
            print("\n表格结果:")
            for doc in result.get("documents", []):
                if doc.content_type == "table":
                    print(f"  - {doc.content}")
            ---
    b.语义搜索
        a.功能说明
            基于Milvus和Haystack构建语义搜索系统。支持自然语言查询。理解查询意图,返回语义相关结果。可以处理同义词、多语言查询。支持过滤和排序。实现个性化搜索。可以集成到网站或应用。提供API接口。
        b.代码示例
            ---
            from haystack.document_stores import MilvusDocumentStore
            from haystack.nodes import EmbeddingRetriever, BM25Retriever
            from haystack import Pipeline
            from haystack.nodes import JoinDocuments
            from flask import Flask, request, jsonify
            
            # 1. 创建文档存储
            document_store = MilvusDocumentStore(
                host="localhost",
                port=19530,
                collection_name="semantic_search",
                embedding_dim=768,
                similarity="cosine"
            )
            
            # 2. 创建混合检索器
            # 语义检索
            embedding_retriever = EmbeddingRetriever(
                document_store=document_store,
                embedding_model="sentence-transformers/all-MiniLM-L6-v2"
            )
            
            # 关键词检索
            bm25_retriever = BM25Retriever(document_store=document_store)
            
            # 3. 构建混合检索Pipeline
            join_documents = JoinDocuments(
                join_mode="concatenate"
            )
            
            hybrid_pipeline = Pipeline()
            hybrid_pipeline.add_node(
                component=embedding_retriever,
                name="EmbeddingRetriever",
                inputs=["Query"]
            )
            hybrid_pipeline.add_node(
                component=bm25_retriever,
                name="BM25Retriever",
                inputs=["Query"]
            )
            hybrid_pipeline.add_node(
                component=join_documents,
                name="JoinDocuments",
                inputs=["EmbeddingRetriever", "BM25Retriever"]
            )
            
            # 4. 创建搜索API
            app = Flask(__name__)
            
            @app.route("/search", methods=["POST"])
            def search():
                data = request.json
                query = data.get("query", "")
                top_k = data.get("top_k", 5)
                filters = data.get("filters", {})
                
                result = hybrid_pipeline.run(
                    query=query,
                    params={
                        "EmbeddingRetriever": {
                            "top_k": top_k,
                            "filters": filters
                        },
                        "BM25Retriever": {
                            "top_k": top_k,
                            "filters": filters
                        }
                    }
                )
                
                documents = result.get("documents", [])
                
                response = {
                    "query": query,
                    "total": len(documents),
                    "results": [
                        {
                            "id": doc.id,
                            "content": doc.content,
                            "score": doc.score,
                            "meta": doc.meta
                        }
                        for doc in documents[:top_k]
                    ]
                }
                
                return jsonify(response)
            
            @app.route("/index", methods=["POST"])
            def index_documents():
                data = request.json
                documents = data.get("documents", [])
                
                document_store.write_documents(documents)
                document_store.update_embeddings(embedding_retriever)
                
                return jsonify({
                    "status": "success",
                    "indexed": len(documents)
                })
            
            # 5. 启动API服务
            # app.run(host="0.0.0.0", port=8000)
            
            # 6. 测试搜索
            test_queries = [
                "machine learning algorithms",
                "deep neural networks",
                "natural language processing"
            ]
            
            for query in test_queries:
                result = hybrid_pipeline.run(
                    query=query,
                    params={
                        "EmbeddingRetriever": {"top_k": 3},
                        "BM25Retriever": {"top_k": 3}
                    }
                )
                
                print(f"\n查询: {query}")
                print(f"结果数: {len(result['documents'])}")
                
                for i, doc in enumerate(result["documents"][:3]):
                    print(f"\n[{i+1}] 分数: {doc.score:.4f}")
                    print(f"    内容: {doc.content[:150]}...")
            
            # 7. 带过滤的搜索
            filtered_result = hybrid_pipeline.run(
                query="machine learning",
                params={
                    "EmbeddingRetriever": {
                        "top_k": 5,
                        "filters": {"category": ["AI", "ML"]}
                    },
                    "BM25Retriever": {
                        "top_k": 5,
                        "filters": {"category": ["AI", "ML"]}
                    }
                }
            )
            
            print(f"\n过滤搜索结果: {len(filtered_result['documents'])} 个文档")
            ---

11 运维监控

11.1 监控指标

01.系统指标
    a.性能指标
        a.功能说明
            Milvus提供丰富的性能监控指标。包括QPS、延迟、吞吐量等核心指标。监控CPU、内存、磁盘、网络使用情况。跟踪查询性能和索引构建进度。支持Prometheus格式导出指标。可以集成Grafana可视化。实时监控系统健康状态。设置告警阈值及时发现问题。
        b.代码示例
            ---
            # Milvus性能指标监控配置
            
            # 1. 启用Prometheus指标导出
            # 在milvus.yaml中配置
            metrics:
              enabled: true
              port: 9091
              path: /metrics
            
            # 2. 访问指标端点
            # curl http://localhost:9091/metrics
            
            # 3. 主要性能指标
            performance_metrics = {
                "查询性能": {
                    "milvus_query_qps": "每秒查询数",
                    "milvus_query_latency_ms": "查询延迟(毫秒)",
                    "milvus_query_success_rate": "查询成功率",
                    "milvus_query_timeout_count": "查询超时次数"
                },
                "写入性能": {
                    "milvus_insert_qps": "每秒插入数",
                    "milvus_insert_latency_ms": "插入延迟(毫秒)",
                    "milvus_insert_success_rate": "插入成功率",
                    "milvus_flush_duration_ms": "刷盘耗时"
                },
                "索引性能": {
                    "milvus_index_build_duration_ms": "索引构建耗时",
                    "milvus_index_build_progress": "索引构建进度",
                    "milvus_index_size_bytes": "索引大小(字节)"
                },
                "系统资源": {
                    "milvus_cpu_usage_percent": "CPU使用率",
                    "milvus_memory_usage_bytes": "内存使用量",
                    "milvus_disk_usage_bytes": "磁盘使用量",
                    "milvus_network_io_bytes": "网络IO"
                }
            }
            
            # 4. Prometheus配置
            prometheus_config = """
            global:
              scrape_interval: 15s
              evaluation_interval: 15s
            
            scrape_configs:
              - job_name: 'milvus'
                static_configs:
                  - targets: ['localhost:9091']
                    labels:
                      instance: 'milvus-standalone'
                      cluster: 'production'
            """
            
            # 5. 使用Python查询指标
            import requests
            
            def get_milvus_metrics():
                response = requests.get("http://localhost:9091/metrics")
                metrics = {}
                
                for line in response.text.split('\n'):
                    if line.startswith('milvus_') and not line.startswith('#'):
                        parts = line.split()
                        if len(parts) >= 2:
                            metric_name = parts[0].split('{')[0]
                            metric_value = float(parts[-1])
                            metrics[metric_name] = metric_value
                
                return metrics
            
            # 获取当前指标
            metrics = get_milvus_metrics()
            
            print("Milvus性能指标:")
            print(f"  QPS: {metrics.get('milvus_query_qps', 0):.2f}")
            print(f"  平均延迟: {metrics.get('milvus_query_latency_ms', 0):.2f}ms")
            print(f"  CPU使用率: {metrics.get('milvus_cpu_usage_percent', 0):.2f}%")
            print(f"  内存使用: {metrics.get('milvus_memory_usage_bytes', 0) / 1024**3:.2f}GB")
            
            # 6. PromQL查询示例
            promql_queries = {
                "平均QPS(5分钟)": "rate(milvus_query_total[5m])",
                "P99延迟": "histogram_quantile(0.99, rate(milvus_query_latency_ms_bucket[5m]))",
                "错误率": "rate(milvus_query_errors_total[5m]) / rate(milvus_query_total[5m])",
                "内存增长率": "rate(milvus_memory_usage_bytes[5m])"
            }
            
            print("\nPromQL查询示例:")
            for name, query in promql_queries.items():
                print(f"  {name}: {query}")
            ---
    b.业务指标
        a.功能说明
            除系统指标外,还需监控业务相关指标。跟踪Collection数量和数据量。监控向量维度分布和数据增长趋势。统计热门查询和慢查询。分析用户行为和使用模式。监控数据质量和准确率。支持自定义业务指标。实现业务监控和分析。
        b.代码示例
            ---
            from pymilvus import connections, utility, Collection
            import time
            from datetime import datetime
            
            connections.connect(host="localhost", port="19530")
            
            # 1. Collection级别指标
            def get_collection_metrics(collection_name):
                collection = Collection(collection_name)
                collection.load()
                
                metrics = {
                    "name": collection_name,
                    "entity_count": collection.num_entities,
                    "schema": {
                        "fields": len(collection.schema.fields),
                        "description": collection.schema.description
                    },
                    "indexes": []
                }
                
                # 获取索引信息
                for field in collection.schema.fields:
                    if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                        index_info = collection.index(field.name).params
                        metrics["indexes"].append({
                            "field": field.name,
                            "type": index_info.get("index_type"),
                            "params": index_info.get("params")
                        })
                
                return metrics
            
            # 2. 数据增长监控
            def monitor_data_growth(collection_name, interval=60):
                """监控数据增长趋势"""
                collection = Collection(collection_name)
                previous_count = 0
                
                while True:
                    current_count = collection.num_entities
                    growth = current_count - previous_count
                    growth_rate = (growth / previous_count * 100) if previous_count > 0 else 0
                    
                    print(f"[{datetime.now()}] 数据量: {current_count}, "
                          f"增长: +{growth}, 增长率: {growth_rate:.2f}%")
                    
                    previous_count = current_count
                    time.sleep(interval)
            
            # 3. 查询性能统计
            class QueryMonitor:
                def __init__(self):
                    self.query_count = 0
                    self.total_latency = 0
                    self.slow_queries = []
                    self.error_count = 0
                
                def record_query(self, query, latency, success=True):
                    self.query_count += 1
                    
                    if success:
                        self.total_latency += latency
                        
                        # 记录慢查询(>100ms)
                        if latency > 100:
                            self.slow_queries.append({
                                "query": query,
                                "latency": latency,
                                "timestamp": datetime.now()
                            })
                    else:
                        self.error_count += 1
                
                def get_stats(self):
                    avg_latency = self.total_latency / self.query_count if self.query_count > 0 else 0
                    error_rate = self.error_count / self.query_count if self.query_count > 0 else 0
                    
                    return {
                        "total_queries": self.query_count,
                        "avg_latency_ms": avg_latency,
                        "slow_queries": len(self.slow_queries),
                        "error_rate": error_rate * 100
                    }
            
            # 4. 使用监控器
            monitor = QueryMonitor()
            collection = Collection("test_collection")
            
            # 模拟查询
            import numpy as np
            
            for i in range(100):
                query_vector = [[np.random.random() for _ in range(128)]]
                
                start = time.time()
                try:
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                    latency = (time.time() - start) * 1000
                    monitor.record_query(f"query_{i}", latency, success=True)
                except Exception as e:
                    monitor.record_query(f"query_{i}", 0, success=False)
            
            # 5. 输出统计
            stats = monitor.get_stats()
            print("\n查询性能统计:")
            print(f"  总查询数: {stats['total_queries']}")
            print(f"  平均延迟: {stats['avg_latency_ms']:.2f}ms")
            print(f"  慢查询数: {stats['slow_queries']}")
            print(f"  错误率: {stats['error_rate']:.2f}%")
            
            # 6. 导出指标到Prometheus
            from prometheus_client import start_http_server, Gauge, Counter
            
            # 定义指标
            query_latency = Gauge('milvus_custom_query_latency_ms', 'Query latency in milliseconds')
            query_count = Counter('milvus_custom_query_total', 'Total number of queries')
            slow_query_count = Counter('milvus_custom_slow_query_total', 'Total number of slow queries')
            
            # 启动HTTP服务器
            # start_http_server(8000)
            
            # 更新指标
            # query_latency.set(stats['avg_latency_ms'])
            # query_count.inc(stats['total_queries'])
            # slow_query_count.inc(stats['slow_queries'])
            ---

02.告警配置
    a.告警规则
        a.功能说明
            配置告警规则,及时发现系统问题。基于Prometheus Alertmanager实现告警。设置阈值,触发告警通知。支持多种告警渠道:邮件、钉钉、Slack等。配置告警级别和优先级。实现告警聚合和抑制。定期检查告警规则有效性。
        b.代码示例
            ---
            # Prometheus告警规则配置
            
            # alert_rules.yml
            alert_rules = """
            groups:
              - name: milvus_alerts
                interval: 30s
                rules:
                  # 高QPS告警
                  - alert: HighQueryRate
                    expr: rate(milvus_query_total[5m]) > 10000
                    for: 5m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus查询QPS过高"
                      description: "当前QPS: {{ $value }}, 超过阈值10000"
                  
                  # 高延迟告警
                  - alert: HighQueryLatency
                    expr: histogram_quantile(0.99, rate(milvus_query_latency_ms_bucket[5m])) > 100
                    for: 5m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus查询延迟过高"
                      description: "P99延迟: {{ $value }}ms, 超过阈值100ms"
                  
                  # 错误率告警
                  - alert: HighErrorRate
                    expr: rate(milvus_query_errors_total[5m]) / rate(milvus_query_total[5m]) > 0.05
                    for: 5m
                    labels:
                      severity: critical
                    annotations:
                      summary: "Milvus错误率过高"
                      description: "错误率: {{ $value | humanizePercentage }}, 超过阈值5%"
                  
                  # 内存使用告警
                  - alert: HighMemoryUsage
                    expr: milvus_memory_usage_bytes / milvus_memory_limit_bytes > 0.9
                    for: 5m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus内存使用率过高"
                      description: "内存使用率: {{ $value | humanizePercentage }}, 超过阈值90%"
                  
                  # 磁盘使用告警
                  - alert: HighDiskUsage
                    expr: milvus_disk_usage_bytes / milvus_disk_limit_bytes > 0.85
                    for: 10m
                    labels:
                      severity: warning
                    annotations:
                      summary: "Milvus磁盘使用率过高"
                      description: "磁盘使用率: {{ $value | humanizePercentage }}, 超过阈值85%"
                  
                  # 服务不可用告警
                  - alert: MilvusDown
                    expr: up{job="milvus"} == 0
                    for: 1m
                    labels:
                      severity: critical
                    annotations:
                      summary: "Milvus服务不可用"
                      description: "Milvus实例 {{ $labels.instance }} 无法访问"
                  
                  # 索引构建缓慢告警
                  - alert: SlowIndexBuilding
                    expr: milvus_index_build_duration_ms > 300000
                    for: 10m
                    labels:
                      severity: warning
                    annotations:
                      summary: "索引构建缓慢"
                      description: "索引构建耗时: {{ $value }}ms, 超过5分钟"
            """
            
            # Alertmanager配置
            alertmanager_config = """
            global:
              resolve_timeout: 5m
              smtp_smarthost: 'smtp.example.com:587'
              smtp_from: '[email protected]'
              smtp_auth_username: 'alertmanager'
              smtp_auth_password: 'password'
            
            route:
              group_by: ['alertname', 'cluster']
              group_wait: 10s
              group_interval: 10s
              repeat_interval: 12h
              receiver: 'default'
              routes:
                - match:
                    severity: critical
                  receiver: 'critical'
                  continue: true
                - match:
                    severity: warning
                  receiver: 'warning'
            
            receivers:
              - name: 'default'
                email_configs:
                  - to: '[email protected]'
              
              - name: 'critical'
                email_configs:
                  - to: '[email protected]'
                webhook_configs:
                  - url: 'https://hooks.slack.com/services/xxx'
              
              - name: 'warning'
                email_configs:
                  - to: '[email protected]'
            
            inhibit_rules:
              - source_match:
                  severity: 'critical'
                target_match:
                  severity: 'warning'
                equal: ['alertname', 'instance']
            """
            
            print("告警规则配置示例:")
            print(alert_rules)
            print("\nAlertmanager配置示例:")
            print(alertmanager_config)
            ---
    b.告警通知
        a.功能说明
            实现多渠道告警通知。支持邮件、短信、电话、IM等方式。配置告警接收人和值班表。实现告警升级机制。支持告警确认和处理。记录告警历史和处理结果。实现告警统计和分析。优化告警策略,减少误报。
        b.代码示例
            ---
            # 自定义告警通知实现
            
            import requests
            import json
            from datetime import datetime
            
            class AlertNotifier:
                def __init__(self):
                    self.alert_history = []
                
                def send_email(self, to, subject, body):
                    """发送邮件告警"""
                    # 实际应用中使用SMTP发送
                    print(f"[邮件] 发送到: {to}")
                    print(f"  主题: {subject}")
                    print(f"  内容: {body}")
                
                def send_dingtalk(self, webhook_url, message):
                    """发送钉钉告警"""
                    data = {
                        "msgtype": "markdown",
                        "markdown": {
                            "title": "Milvus告警",
                            "text": message
                        }
                    }
                    
                    try:
                        response = requests.post(
                            webhook_url,
                            headers={"Content-Type": "application/json"},
                            data=json.dumps(data)
                        )
                        print(f"[钉钉] 发送成功: {response.status_code}")
                    except Exception as e:
                        print(f"[钉钉] 发送失败: {e}")
                
                def send_slack(self, webhook_url, message):
                    """发送Slack告警"""
                    data = {
                        "text": message,
                        "username": "Milvus Alert",
                        "icon_emoji": ":warning:"
                    }
                    
                    try:
                        response = requests.post(
                            webhook_url,
                            headers={"Content-Type": "application/json"},
                            data=json.dumps(data)
                        )
                        print(f"[Slack] 发送成功: {response.status_code}")
                    except Exception as e:
                        print(f"[Slack] 发送失败: {e}")
                
                def process_alert(self, alert):
                    """处理告警"""
                    alert_info = {
                        "name": alert["labels"]["alertname"],
                        "severity": alert["labels"]["severity"],
                        "summary": alert["annotations"]["summary"],
                        "description": alert["annotations"]["description"],
                        "timestamp": datetime.now()
                    }
                    
                    self.alert_history.append(alert_info)
                    
                    # 根据严重程度选择通知方式
                    if alert_info["severity"] == "critical":
                        # 紧急告警:多渠道通知
                        self.send_email(
                            to="[email protected]",
                            subject=f"[紧急] {alert_info['summary']}",
                            body=alert_info["description"]
                        )
                        self.send_dingtalk(
                            webhook_url="https://oapi.dingtalk.com/robot/send?access_token=xxx",
                            message=f"## [紧急告警]\\n\\n**{alert_info['summary']}**\\n\\n{alert_info['description']}"
                        )
                    elif alert_info["severity"] == "warning":
                        # 警告:邮件通知
                        self.send_email(
                            to="[email protected]",
                            subject=f"[警告] {alert_info['summary']}",
                            body=alert_info["description"]
                        )
                    
                    return alert_info
            
            # 使用告警通知器
            notifier = AlertNotifier()
            
            # 模拟告警
            sample_alert = {
                "labels": {
                    "alertname": "HighQueryLatency",
                    "severity": "warning",
                    "instance": "milvus-01"
                },
                "annotations": {
                    "summary": "Milvus查询延迟过高",
                    "description": "P99延迟: 150ms, 超过阈值100ms"
                }
            }
            
            alert_info = notifier.process_alert(sample_alert)
            print(f"\n告警已处理: {alert_info['name']}")
            
            # 告警统计
            def get_alert_stats(notifier):
                stats = {
                    "total": len(notifier.alert_history),
                    "by_severity": {},
                    "by_name": {}
                }
                
                for alert in notifier.alert_history:
                    # 按严重程度统计
                    severity = alert["severity"]
                    stats["by_severity"][severity] = stats["by_severity"].get(severity, 0) + 1
                    
                    # 按告警名称统计
                    name = alert["name"]
                    stats["by_name"][name] = stats["by_name"].get(name, 0) + 1
                
                return stats
            
            stats = get_alert_stats(notifier)
            print(f"\n告警统计:")
            print(f"  总数: {stats['total']}")
            print(f"  按严重程度: {stats['by_severity']}")
            print(f"  按名称: {stats['by_name']}")
            ---

11.2 日志管理

01.日志配置
    a.日志级别
        a.功能说明
            Milvus支持多种日志级别配置。包括debug、info、warn、error、fatal五个级别。开发环境使用debug级别,生产环境使用info或warn。通过配置文件或环境变量设置日志级别。支持动态调整日志级别,无需重启。不同组件可以配置不同日志级别。合理配置日志级别,平衡详细度和性能。
        b.代码示例
            ---
            # Milvus日志配置(milvus.yaml)
            
            log_config = """
            log:
              level: info  # debug, info, warn, error, fatal
              file:
                rootPath: /var/log/milvus
                maxSize: 300  # MB
                maxAge: 10    # days
                maxBackups: 20
              format: json  # text or json
              stdout: true
            """
            
            # 通过环境变量设置
            # export LOG_LEVEL=debug
            # export LOG_FORMAT=json
            # export LOG_FILE_MAXSIZE=500
            
            # Docker Compose配置
            docker_compose_log = """
            services:
              milvus:
                environment:
                  - LOG_LEVEL=info
                  - LOG_FORMAT=json
                  - LOG_FILE_MAXSIZE=300
                  - LOG_FILE_MAXAGE=10
                  - LOG_FILE_MAXBACKUPS=20
                volumes:
                  - /var/log/milvus:/var/log/milvus
            """
            
            # Kubernetes ConfigMap配置
            k8s_log_config = """
            apiVersion: v1
            kind: ConfigMap
            metadata:
              name: milvus-log-config
              namespace: milvus
            data:
              log.level: "info"
              log.format: "json"
              log.file.maxSize: "300"
              log.file.maxAge: "10"
              log.file.maxBackups: "20"
            """
            
            print("日志配置示例:")
            print(log_config)
            print("\nDocker Compose日志配置:")
            print(docker_compose_log)
            print("\nKubernetes日志配置:")
            print(k8s_log_config)
            
            # 日志级别说明
            log_levels = {
                "debug": "详细调试信息,包含所有操作细节",
                "info": "一般信息,记录重要操作和状态变化",
                "warn": "警告信息,可能的问题但不影响运行",
                "error": "错误信息,操作失败但服务继续运行",
                "fatal": "致命错误,服务无法继续运行"
            }
            
            print("\n日志级别说明:")
            for level, desc in log_levels.items():
                print(f"  {level}: {desc}")
            
            # 不同环境的推荐配置
            env_configs = {
                "开发环境": {
                    "level": "debug",
                    "format": "text",
                    "stdout": True
                },
                "测试环境": {
                    "level": "info",
                    "format": "json",
                    "stdout": True
                },
                "生产环境": {
                    "level": "warn",
                    "format": "json",
                    "stdout": False
                }
            }
            
            print("\n不同环境的推荐配置:")
            for env, config in env_configs.items():
                print(f"  {env}: {config}")
            ---
    b.日志轮转
        a.功能说明
            配置日志轮转,避免日志文件过大。设置单个日志文件最大大小。配置日志文件保留天数。限制日志备份文件数量。支持按时间或大小轮转。自动压缩旧日志文件。定期清理过期日志。实现日志归档和备份。
        b.代码示例
            ---
            # 日志轮转配置
            
            # 1. Milvus内置日志轮转
            milvus_log_rotation = """
            log:
              file:
                rootPath: /var/log/milvus
                maxSize: 300      # 单个文件最大300MB
                maxAge: 10        # 保留10天
                maxBackups: 20    # 最多20个备份文件
            """
            
            # 2. 使用logrotate(Linux)
            logrotate_config = """
            # /etc/logrotate.d/milvus
            
            /var/log/milvus/*.log {
                daily                    # 每天轮转
                rotate 7                 # 保留7天
                compress                 # 压缩旧日志
                delaycompress           # 延迟压缩
                missingok               # 文件不存在不报错
                notifempty              # 空文件不轮转
                create 0644 milvus milvus  # 创建新文件权限
                sharedscripts
                postrotate
                    # 重新加载Milvus日志配置
                    killall -SIGUSR1 milvus || true
                endscript
            }
            """
            
            # 3. Docker日志轮转
            docker_log_config = """
            # docker-compose.yml
            services:
              milvus:
                logging:
                  driver: "json-file"
                  options:
                    max-size: "100m"    # 单个文件最大100MB
                    max-file: "10"      # 最多10个文件
                    compress: "true"    # 压缩日志
            """
            
            # 4. Kubernetes日志轮转
            k8s_log_rotation = """
            # 使用fluentd或filebeat收集日志
            apiVersion: v1
            kind: ConfigMap
            metadata:
              name: fluentd-config
            data:
              fluent.conf: |
                <source>
                  @type tail
                  path /var/log/milvus/*.log
                  pos_file /var/log/fluentd/milvus.log.pos
                  tag milvus.*
                  <parse>
                    @type json
                  </parse>
                </source>
                
                <match milvus.**>
                  @type elasticsearch
                  host elasticsearch.logging.svc.cluster.local
                  port 9200
                  logstash_format true
                  logstash_prefix milvus
                  <buffer>
                    @type file
                    path /var/log/fluentd/buffer
                    flush_interval 10s
                  </buffer>
                </match>
            """
            
            print("日志轮转配置:")
            print("\n1. Milvus内置:")
            print(milvus_log_rotation)
            print("\n2. logrotate:")
            print(logrotate_config)
            print("\n3. Docker:")
            print(docker_log_config)
            print("\n4. Kubernetes:")
            print(k8s_log_rotation)
            
            # 5. Python脚本清理旧日志
            import os
            import time
            from datetime import datetime, timedelta
            
            def cleanup_old_logs(log_dir, days=7):
                """清理超过指定天数的日志文件"""
                cutoff_time = time.time() - (days * 86400)
                cleaned_count = 0
                cleaned_size = 0
                
                for filename in os.listdir(log_dir):
                    filepath = os.path.join(log_dir, filename)
                    
                    if os.path.isfile(filepath) and filename.endswith('.log'):
                        file_mtime = os.path.getmtime(filepath)
                        
                        if file_mtime < cutoff_time:
                            file_size = os.path.getsize(filepath)
                            os.remove(filepath)
                            cleaned_count += 1
                            cleaned_size += file_size
                            print(f"删除: {filename}")
                
                print(f"\n清理完成: 删除{cleaned_count}个文件, 释放{cleaned_size/1024/1024:.2f}MB空间")
            
            # cleanup_old_logs("/var/log/milvus", days=7)
            ---

02.日志分析
    a.日志收集
        a.功能说明
            集中收集Milvus日志,便于分析和查询。使用ELK或EFK栈收集日志。支持多种日志收集工具:Filebeat、Fluentd、Logstash。实现日志聚合和索引。支持全文搜索和过滤。可视化日志数据。实现日志告警和监控。
        b.代码示例
            ---
            # 日志收集方案
            
            # 1. 使用Filebeat收集日志到Elasticsearch
            filebeat_config = """
            # filebeat.yml
            
            filebeat.inputs:
              - type: log
                enabled: true
                paths:
                  - /var/log/milvus/*.log
                fields:
                  service: milvus
                  environment: production
                json.keys_under_root: true
                json.add_error_key: true
            
            processors:
              - add_host_metadata: ~
              - add_cloud_metadata: ~
              - add_docker_metadata: ~
              - add_kubernetes_metadata: ~
            
            output.elasticsearch:
              hosts: ["elasticsearch:9200"]
              index: "milvus-logs-%{+yyyy.MM.dd}"
              username: "elastic"
              password: "changeme"
            
            setup.kibana:
              host: "kibana:5601"
            
            setup.ilm.enabled: true
            setup.ilm.rollover_alias: "milvus-logs"
            setup.ilm.pattern: "{now/d}-000001"
            """
            
            # 2. 使用Fluentd收集日志
            fluentd_config = """
            # fluent.conf
            
            <source>
              @type tail
              path /var/log/milvus/*.log
              pos_file /var/log/fluentd/milvus.log.pos
              tag milvus.log
              <parse>
                @type json
                time_key time
                time_format %Y-%m-%dT%H:%M:%S.%NZ
              </parse>
            </source>
            
            <filter milvus.log>
              @type record_transformer
              <record>
                hostname "#{Socket.gethostname}"
                service "milvus"
                environment "production"
              </record>
            </filter>
            
            <match milvus.log>
              @type elasticsearch
              host elasticsearch
              port 9200
              logstash_format true
              logstash_prefix milvus
              <buffer>
                @type file
                path /var/log/fluentd/buffer
                flush_interval 10s
                retry_max_times 3
              </buffer>
            </match>
            """
            
            # 3. Docker Compose部署ELK
            elk_docker_compose = """
            version: '3'
            
            services:
              elasticsearch:
                image: docker.elastic.co/elasticsearch/elasticsearch:8.5.0
                environment:
                  - discovery.type=single-node
                  - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
                volumes:
                  - es_data:/usr/share/elasticsearch/data
                ports:
                  - "9200:9200"
              
              kibana:
                image: docker.elastic.co/kibana/kibana:8.5.0
                environment:
                  - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
                ports:
                  - "5601:5601"
                depends_on:
                  - elasticsearch
              
              filebeat:
                image: docker.elastic.co/beats/filebeat:8.5.0
                user: root
                volumes:
                  - ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
                  - /var/log/milvus:/var/log/milvus:ro
                  - filebeat_data:/usr/share/filebeat/data
                depends_on:
                  - elasticsearch
            
            volumes:
              es_data:
              filebeat_data:
            """
            
            print("日志收集配置:")
            print("\n1. Filebeat:")
            print(filebeat_config)
            print("\n2. Fluentd:")
            print(fluentd_config)
            print("\n3. ELK Docker Compose:")
            print(elk_docker_compose)
            
            # 4. Python查询Elasticsearch日志
            from elasticsearch import Elasticsearch
            from datetime import datetime, timedelta
            
            def query_milvus_logs(es_host="localhost:9200", hours=1):
                """查询最近N小时的Milvus日志"""
                es = Elasticsearch([es_host])
                
                # 构建查询
                query = {
                    "query": {
                        "bool": {
                            "must": [
                                {"match": {"service": "milvus"}},
                                {"range": {
                                    "@timestamp": {
                                        "gte": f"now-{hours}h",
                                        "lte": "now"
                                    }
                                }}
                            ]
                        }
                    },
                    "sort": [{"@timestamp": {"order": "desc"}}],
                    "size": 100
                }
                
                # 执行查询
                result = es.search(index="milvus-logs-*", body=query)
                
                print(f"查询到 {result['hits']['total']['value']} 条日志:\n")
                
                for hit in result['hits']['hits']:
                    log = hit['_source']
                    print(f"[{log.get('@timestamp')}] {log.get('level', 'INFO')}: {log.get('message', '')}")
                
                return result
            
            # query_milvus_logs(hours=1)
            
            # 5. 查询错误日志
            def query_error_logs(es_host="localhost:9200", hours=24):
                """查询错误日志"""
                es = Elasticsearch([es_host])
                
                query = {
                    "query": {
                        "bool": {
                            "must": [
                                {"match": {"service": "milvus"}},
                                {"terms": {"level": ["error", "fatal"]}},
                                {"range": {
                                    "@timestamp": {
                                        "gte": f"now-{hours}h"
                                    }
                                }}
                            ]
                        }
                    },
                    "aggs": {
                        "error_types": {
                            "terms": {
                                "field": "message.keyword",
                                "size": 10
                            }
                        }
                    }
                }
                
                result = es.search(index="milvus-logs-*", body=query)
                
                print(f"错误日志统计:")
                for bucket in result['aggregations']['error_types']['buckets']:
                    print(f"  {bucket['key']}: {bucket['doc_count']}次")
            
            # query_error_logs(hours=24)
            ---
    b.日志分析
        a.功能说明
            分析Milvus日志,发现问题和优化机会。统计错误类型和频率。分析慢查询和性能瓶颈。识别异常模式和趋势。生成日志报告和可视化。实现日志告警和通知。支持自定义分析规则。提供日志查询API。
        b.代码示例
            ---
            # 日志分析工具
            
            import re
            from collections import Counter
            from datetime import datetime
            
            class LogAnalyzer:
                def __init__(self, log_file):
                    self.log_file = log_file
                    self.logs = []
                    self.load_logs()
                
                def load_logs(self):
                    """加载日志文件"""
                    with open(self.log_file, 'r') as f:
                        for line in f:
                            try:
                                import json
                                log = json.loads(line)
                                self.logs.append(log)
                            except:
                                pass
                
                def count_by_level(self):
                    """按级别统计日志"""
                    levels = [log.get('level', 'UNKNOWN') for log in self.logs]
                    return Counter(levels)
                
                def find_errors(self):
                    """查找错误日志"""
                    errors = [log for log in self.logs if log.get('level') in ['error', 'fatal']]
                    return errors
                
                def find_slow_queries(self, threshold_ms=100):
                    """查找慢查询"""
                    slow_queries = []
                    
                    for log in self.logs:
                        if 'query' in log.get('message', '').lower():
                            latency = log.get('latency_ms', 0)
                            if latency > threshold_ms:
                                slow_queries.append({
                                    'time': log.get('time'),
                                    'latency': latency,
                                    'message': log.get('message')
                                })
                    
                    return sorted(slow_queries, key=lambda x: x['latency'], reverse=True)
                
                def analyze_patterns(self):
                    """分析日志模式"""
                    messages = [log.get('message', '') for log in self.logs]
                    message_counts = Counter(messages)
                    
                    # 找出最频繁的消息
                    top_messages = message_counts.most_common(10)
                    
                    return top_messages
                
                def generate_report(self):
                    """生成分析报告"""
                    report = {
                        'total_logs': len(self.logs),
                        'by_level': dict(self.count_by_level()),
                        'error_count': len(self.find_errors()),
                        'slow_query_count': len(self.find_slow_queries()),
                        'top_messages': self.analyze_patterns()
                    }
                    
                    return report
            
            # 使用日志分析器
            # analyzer = LogAnalyzer('/var/log/milvus/milvus.log')
            # report = analyzer.generate_report()
            
            # print("日志分析报告:")
            # print(f"  总日志数: {report['total_logs']}")
            # print(f"  按级别: {report['by_level']}")
            # print(f"  错误数: {report['error_count']}")
            # print(f"  慢查询数: {report['slow_query_count']}")
            
            # Kibana查询示例
            kibana_queries = {
                "错误日志": {
                    "query": 'level:"error" OR level:"fatal"',
                    "time_range": "Last 24 hours"
                },
                "慢查询": {
                    "query": 'message:"query" AND latency_ms:>100',
                    "time_range": "Last 1 hour"
                },
                "高QPS": {
                    "query": 'message:"query"',
                    "aggregation": "count by 1 minute",
                    "threshold": "> 1000"
                },
                "内存告警": {
                    "query": 'message:"memory" AND level:"warn"',
                    "time_range": "Last 6 hours"
                }
            }
            
            print("\nKibana查询示例:")
            for name, query in kibana_queries.items():
                print(f"\n{name}:")
                for key, value in query.items():
                    print(f"  {key}: {value}")
            ---

11.3 备份恢复

01.备份策略
    a.全量备份
        a.功能说明
            定期进行全量备份,保护数据安全。备份包括向量数据、元数据、配置文件。使用Milvus Backup工具或手动备份。备份到本地磁盘或对象存储。设置备份保留策略。验证备份完整性。记录备份历史和状态。实现自动化备份流程。
        b.代码示例
            ---
            # Milvus全量备份
            
            # 1. 使用Milvus Backup工具
            backup_commands = """
            # 安装Milvus Backup
            wget https://github.com/zilliztech/milvus-backup/releases/download/v0.3.0/milvus-backup
            chmod +x milvus-backup
            
            # 配置backup.yaml
            cat > backup.yaml <<EOF
            milvus:
              address: localhost
              port: 19530
              username: ""
              password: ""
            
            minio:
              address: localhost
              port: 9000
              accessKeyID: minioadmin
              secretAccessKey: minioadmin
              useSSL: false
              bucketName: milvus-bucket
            
            backup:
              backupPath: /backup/milvus
              maxBackupNum: 7
            EOF
            
            # 创建备份
            ./milvus-backup create -n backup_20240115
            
            # 列出备份
            ./milvus-backup list
            
            # 查看备份详情
            ./milvus-backup get -n backup_20240115
            
            # 删除备份
            ./milvus-backup delete -n backup_20240115
            """
            
            # 2. 手动备份脚本
            backup_script = """
            #!/bin/bash
            # Milvus手动备份脚本
            
            BACKUP_DIR="/backup/milvus/$(date +%Y%m%d_%H%M%S)"
            mkdir -p $BACKUP_DIR
            
            echo "开始备份Milvus数据..."
            
            # 备份MinIO数据(向量数据)
            echo "备份MinIO数据..."
            mc mirror milvus-minio/milvus-bucket $BACKUP_DIR/minio-data
            
            # 备份etcd数据(元数据)
            echo "备份etcd数据..."
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot save /tmp/snapshot.db
            kubectl cp milvus/etcd-0:/tmp/snapshot.db $BACKUP_DIR/etcd-snapshot.db
            
            # 备份配置文件
            echo "备份配置文件..."
            kubectl get configmap -n milvus -o yaml > $BACKUP_DIR/configmaps.yaml
            kubectl get secret -n milvus -o yaml > $BACKUP_DIR/secrets.yaml
            
            # 压缩备份
            echo "压缩备份文件..."
            tar -czf $BACKUP_DIR.tar.gz -C $(dirname $BACKUP_DIR) $(basename $BACKUP_DIR)
            rm -rf $BACKUP_DIR
            
            # 上传到S3
            echo "上传到S3..."
            aws s3 cp $BACKUP_DIR.tar.gz s3://milvus-backups/
            
            # 清理本地备份(保留最近7天)
            find /backup/milvus -name "*.tar.gz" -mtime +7 -delete
            
            echo "备份完成: $BACKUP_DIR.tar.gz"
            """
            
            # 3. Python备份脚本
            import subprocess
            import os
            from datetime import datetime
            
            def backup_milvus(backup_dir="/backup/milvus"):
                """执行Milvus备份"""
                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                backup_path = os.path.join(backup_dir, f"backup_{timestamp}")
                os.makedirs(backup_path, exist_ok=True)
                
                print(f"开始备份到: {backup_path}")
                
                # 备份MinIO
                print("备份MinIO数据...")
                subprocess.run([
                    "mc", "mirror",
                    "milvus-minio/milvus-bucket",
                    f"{backup_path}/minio-data"
                ])
                
                # 备份etcd
                print("备份etcd数据...")
                subprocess.run([
                    "kubectl", "exec", "-n", "milvus", "etcd-0", "--",
                    "etcdctl", "snapshot", "save", "/tmp/snapshot.db"
                ])
                subprocess.run([
                    "kubectl", "cp",
                    "milvus/etcd-0:/tmp/snapshot.db",
                    f"{backup_path}/etcd-snapshot.db"
                ])
                
                # 压缩备份
                print("压缩备份...")
                subprocess.run([
                    "tar", "-czf", f"{backup_path}.tar.gz",
                    "-C", backup_dir,
                    f"backup_{timestamp}"
                ])
                
                # 清理临时目录
                subprocess.run(["rm", "-rf", backup_path])
                
                print(f"备份完成: {backup_path}.tar.gz")
                return f"{backup_path}.tar.gz"
            
            # backup_milvus()
            
            # 4. 定时备份(crontab)
            crontab_config = """
            # 每天凌晨2点执行备份
            0 2 * * * /opt/scripts/backup-milvus.sh >> /var/log/milvus-backup.log 2>&1
            
            # 每周日凌晨3点执行全量备份
            0 3 * * 0 /opt/scripts/backup-milvus-full.sh >> /var/log/milvus-backup.log 2>&1
            """
            
            print("备份命令:")
            print(backup_commands)
            print("\n备份脚本:")
            print(backup_script)
            print("\n定时备份配置:")
            print(crontab_config)
            ---
    b.增量备份
        a.功能说明
            增量备份只备份变化的数据,节省存储空间。基于时间戳或版本号识别变化。适合频繁更新的场景。结合全量备份使用。需要记录备份基线。恢复时需要全量+增量。实现快速备份和恢复。
        b.代码示例
            ---
            # Milvus增量备份实现
            
            from pymilvus import connections, Collection, utility
            from datetime import datetime
            import json
            
            class IncrementalBackup:
                def __init__(self, backup_dir="/backup/milvus/incremental"):
                    self.backup_dir = backup_dir
                    self.metadata_file = f"{backup_dir}/metadata.json"
                    self.load_metadata()
                
                def load_metadata(self):
                    """加载备份元数据"""
                    try:
                        with open(self.metadata_file, 'r') as f:
                            self.metadata = json.load(f)
                    except:
                        self.metadata = {
                            "last_backup_time": None,
                            "collections": {}
                        }
                
                def save_metadata(self):
                    """保存备份元数据"""
                    os.makedirs(self.backup_dir, exist_ok=True)
                    with open(self.metadata_file, 'w') as f:
                        json.dump(self.metadata, f, indent=2)
                
                def backup_collection(self, collection_name):
                    """增量备份Collection"""
                    collection = Collection(collection_name)
                    
                    # 获取上次备份时间
                    last_backup = self.metadata["collections"].get(collection_name, {}).get("last_backup_time")
                    
                    # 查询新增数据
                    if last_backup:
                        # 假设有timestamp字段
                        expr = f"timestamp > {last_backup}"
                        results = collection.query(expr=expr, output_fields=["*"])
                    else:
                        # 全量备份
                        results = collection.query(expr="", output_fields=["*"])
                    
                    if not results:
                        print(f"{collection_name}: 没有新数据")
                        return
                    
                    # 保存增量数据
                    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                    backup_file = f"{self.backup_dir}/{collection_name}_{timestamp}.json"
                    
                    with open(backup_file, 'w') as f:
                        json.dump(results, f)
                    
                    # 更新元数据
                    self.metadata["collections"][collection_name] = {
                        "last_backup_time": datetime.now().timestamp(),
                        "last_backup_file": backup_file,
                        "record_count": len(results)
                    }
                    self.save_metadata()
                    
                    print(f"{collection_name}: 备份{len(results)}条记录到 {backup_file}")
                
                def backup_all(self):
                    """增量备份所有Collection"""
                    collections = utility.list_collections()
                    
                    for coll_name in collections:
                        self.backup_collection(coll_name)
                    
                    self.metadata["last_backup_time"] = datetime.now().isoformat()
                    self.save_metadata()
            
            # 使用增量备份
            # connections.connect(host="localhost", port="19530")
            # backup = IncrementalBackup()
            # backup.backup_all()
            
            # 增量备份脚本
            incremental_backup_script = """
            #!/bin/bash
            # 增量备份脚本
            
            BACKUP_DIR="/backup/milvus/incremental"
            TIMESTAMP=$(date +%Y%m%d_%H%M%S)
            
            # 获取上次备份时间
            LAST_BACKUP=$(cat $BACKUP_DIR/last_backup_time.txt 2>/dev/null || echo "0")
            CURRENT_TIME=$(date +%s)
            
            # 备份MinIO中的新文件
            mc mirror --newer-than ${LAST_BACKUP}s milvus-minio/milvus-bucket $BACKUP_DIR/$TIMESTAMP/
            
            # 记录本次备份时间
            echo $CURRENT_TIME > $BACKUP_DIR/last_backup_time.txt
            
            # 压缩备份
            tar -czf $BACKUP_DIR/incremental_$TIMESTAMP.tar.gz -C $BACKUP_DIR $TIMESTAMP
            rm -rf $BACKUP_DIR/$TIMESTAMP
            
            echo "增量备份完成: incremental_$TIMESTAMP.tar.gz"
            """
            
            print("增量备份脚本:")
            print(incremental_backup_script)
            ---

02.恢复流程
    a.数据恢复
        a.功能说明
            从备份恢复Milvus数据。支持全量恢复和增量恢复。恢复前停止Milvus服务。恢复向量数据、元数据、配置。验证恢复后的数据完整性。测试服务可用性。记录恢复过程和结果。制定恢复预案和演练。
        b.代码示例
            ---
            # Milvus数据恢复
            
            # 1. 使用Milvus Backup恢复
            restore_commands = """
            # 列出可用备份
            ./milvus-backup list
            
            # 恢复指定备份
            ./milvus-backup restore -n backup_20240115
            
            # 恢复到指定Collection
            ./milvus-backup restore -n backup_20240115 -c collection_name
            
            # 恢复并重命名Collection
            ./milvus-backup restore -n backup_20240115 -c old_name -t new_name
            """
            
            # 2. 手动恢复脚本
            restore_script = """
            #!/bin/bash
            # Milvus手动恢复脚本
            
            BACKUP_FILE=$1
            
            if [ -z "$BACKUP_FILE" ]; then
                echo "用法: $0 <backup_file.tar.gz>"
                exit 1
            fi
            
            echo "开始恢复Milvus数据..."
            
            # 停止Milvus服务
            echo "停止Milvus服务..."
            kubectl scale deployment milvus-standalone --replicas=0 -n milvus
            sleep 10
            
            # 解压备份
            echo "解压备份文件..."
            RESTORE_DIR="/tmp/milvus_restore"
            mkdir -p $RESTORE_DIR
            tar -xzf $BACKUP_FILE -C $RESTORE_DIR
            
            # 恢复etcd数据
            echo "恢复etcd数据..."
            kubectl cp $RESTORE_DIR/etcd-snapshot.db milvus/etcd-0:/tmp/snapshot.db
            kubectl exec -n milvus etcd-0 -- etcdctl snapshot restore /tmp/snapshot.db \\
                --data-dir=/var/lib/etcd-restore
            
            # 恢复MinIO数据
            echo "恢复MinIO数据..."
            mc mirror $RESTORE_DIR/minio-data milvus-minio/milvus-bucket
            
            # 恢复配置
            echo"恢复配置..."
            kubectl apply -f $RESTORE_DIR/configmaps.yaml
            kubectl apply -f $RESTORE_DIR/secrets.yaml
            
            # 启动Milvus服务
            echo "启动Milvus服务..."
            kubectl scale deployment milvus-standalone --replicas=1 -n milvus
            
            # 等待服务就绪
            echo "等待服务就绪..."
            kubectl wait --for=condition=ready pod -l app=milvus -n milvus --timeout=300s
            
            # 清理临时文件
            rm -rf $RESTORE_DIR
            
            echo "恢复完成!"
            """
            
            # 3. Python恢复脚本
            import subprocess
            import os
            import time
            
            def restore_milvus(backup_file):
                """恢复Milvus数据"""
                print(f"开始恢复: {backup_file}")
                
                # 停止服务
                print("停止Milvus服务...")
                subprocess.run([
                    "kubectl", "scale", "deployment", "milvus-standalone",
                    "--replicas=0", "-n", "milvus"
                ])
                time.sleep(10)
                
                # 解压备份
                print("解压备份...")
                restore_dir = "/tmp/milvus_restore"
                os.makedirs(restore_dir, exist_ok=True)
                subprocess.run([
                    "tar", "-xzf", backup_file,
                    "-C", restore_dir
                ])
                
                # 恢复数据
                print("恢复数据...")
                # ... 恢复逻辑 ...
                
                # 启动服务
                print("启动服务...")
                subprocess.run([
                    "kubectl", "scale", "deployment", "milvus-standalone",
                    "--replicas=1", "-n", "milvus"
                ])
                
                # 等待就绪
                print("等待服务就绪...")
                subprocess.run([
                    "kubectl", "wait", "--for=condition=ready",
                    "pod", "-l", "app=milvus",
                    "-n", "milvus", "--timeout=300s"
                ])
                
                print("恢复完成!")
            
            # restore_milvus("/backup/milvus/backup_20240115.tar.gz")
            
            # 4. 验证恢复
            from pymilvus import connections, utility, Collection
            
            def verify_restore():
                """验证恢复后的数据"""
                connections.connect(host="localhost", port="19530")
                
                print("验证恢复结果:\n")
                
                # 检查Collections
                collections = utility.list_collections()
                print(f"Collections数量: {len(collections)}")
                
                for coll_name in collections:
                    collection = Collection(coll_name)
                    count = collection.num_entities
                    print(f"  {coll_name}: {count} entities")
                
                # 测试查询
                if collections:
                    collection = Collection(collections[0])
                    collection.load()
                    
                    import numpy as np
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    results = collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                    
                    print(f"\n测试查询成功: 返回{len(results[0])}个结果")
                
                connections.disconnect("default")
            
            # verify_restore()
            
            print("恢复命令:")
            print(restore_commands)
            print("\n恢复脚本:")
            print(restore_script)
            ---
    b.灾难恢复
        a.功能说明
            制定灾难恢复计划,应对极端情况。定义RTO和RPO目标。准备备用环境和资源。定期演练恢复流程。文档化恢复步骤。建立应急响应团队。实现跨区域容灾。监控恢复进度和状态。
        b.代码示例
            ---
            # 灾难恢复计划
            
            disaster_recovery_plan = """
            # Milvus灾难恢复计划
            
            ## 1. 恢复目标
            - RTO (恢复时间目标): 2小时
            - RPO (恢复点目标): 24小时
            
            ## 2. 恢复流程
            
            ### 2.1 评估阶段(15分钟)
            - [ ] 确认灾难类型和影响范围
            - [ ] 评估数据丢失程度
            - [ ] 确定恢复策略
            - [ ] 通知相关人员
            
            ### 2.2 准备阶段(30分钟)
            - [ ] 准备备用环境
            - [ ] 下载最新备份
            - [ ] 验证备份完整性
            - [ ] 准备恢复工具
            
            ### 2.3 恢复阶段(60分钟)
            - [ ] 部署Milvus集群
            - [ ] 恢复etcd数据
            - [ ] 恢复MinIO数据
            - [ ] 恢复配置文件
            - [ ] 启动服务
            
            ### 2.4 验证阶段(15分钟)
            - [ ] 验证数据完整性
            - [ ] 测试查询功能
            - [ ] 测试写入功能
            - [ ] 性能测试
            
            ## 3. 联系人
            - 技术负责人: xxx (电话: xxx)
            - 运维负责人: xxx (电话: xxx)
            - 业务负责人: xxx (电话: xxx)
            
            ## 4. 备用资源
            - 备用集群: xxx
            - 备份存储: s3://milvus-backups/
            - 监控地址: https://monitoring.example.com
            """
            
            # 灾难恢复脚本
            dr_script = """
            #!/bin/bash
            # 灾难恢复自动化脚本
            
            set -e
            
            echo "=========================================="
            echo "Milvus灾难恢复脚本"
            echo "=========================================="
            
            # 1. 评估阶段
            echo "1. 评估灾难影响..."
            BACKUP_LOCATION="s3://milvus-backups/"
            LATEST_BACKUP=$(aws s3 ls $BACKUP_LOCATION | sort | tail -n 1 | awk '{print $4}')
            
            echo "最新备份: $LATEST_BACKUP"
            
            # 2. 准备阶段
            echo "2. 准备恢复环境..."
            
            # 创建新的Kubernetes命名空间
            kubectl create namespace milvus-dr
            
            # 部署依赖服务
            helm install etcd bitnami/etcd -n milvus-dr
            helm install minio bitnami/minio -n milvus-dr
            helm install pulsar apache/pulsar -n milvus-dr
            
            # 3. 恢复阶段
            echo "3. 恢复数据..."
            
            # 下载备份
            aws s3 cp $BACKUP_LOCATION$LATEST_BACKUP /tmp/backup.tar.gz
            
            # 解压备份
            tar -xzf /tmp/backup.tar.gz -C /tmp/
            
            # 恢复数据
            # ... 恢复逻辑 ...
            
            # 部署Milvus
            helm install milvus-dr milvus/milvus -n milvus-dr
            
            # 4. 验证阶段
            echo "4. 验证恢复结果..."
            
            # 等待服务就绪
            kubectl wait --for=condition=ready pod -l app=milvus -n milvus-dr --timeout=300s
            
            # 运行验证脚本
            python3 verify_restore.py
            
            echo "=========================================="
            echo "灾难恢复完成!"
            echo "=========================================="
            """
            
            print("灾难恢复计划:")
            print(disaster_recovery_plan)
            print("\n灾难恢复脚本:")
            print(dr_script)
            ---

11.4 故障处理

01.常见故障
    a.连接失败
        a.功能说明
            连接失败是最常见的问题之一。可能原因包括网络问题、服务未启动、端口配置错误、防火墙阻止等。检查Milvus服务状态和网络连通性。验证连接参数配置。查看防火墙和安全组设置。检查DNS解析。使用telnet或curl测试连接。查看Milvus日志获取详细错误信息。
        b.代码示例
            ---
            # 连接失败故障排查
            
            from pymilvus import connections
            import socket
            import subprocess
            
            def diagnose_connection(host="localhost", port="19530"):
                """诊断连接问题"""
                print(f"诊断Milvus连接: {host}:{port}\n")
                
                # 1. 检查网络连通性
                print("1. 检查网络连通性...")
                try:
                    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                    sock.settimeout(5)
                    result = sock.connect_ex((host, int(port)))
                    sock.close()
                    
                    if result == 0:
                        print("   ✓ 端口可达")
                    else:
                        print(f"   ✗ 端口不可达 (错误码: {result})")
                        return
                except Exception as e:
                    print(f"   ✗ 网络错误: {e}")
                    return
                
                # 2. 检查DNS解析
                print("\n2. 检查DNS解析...")
                try:
                    ip = socket.gethostbyname(host)
                    print(f"   ✓ DNS解析成功: {host} -> {ip}")
                except Exception as e:
                    print(f"   ✗ DNS解析失败: {e}")
                
                # 3. 测试Milvus连接
                print("\n3. 测试Milvus连接...")
                try:
                    connections.connect(
                        alias="test",
                        host=host,
                        port=port,
                        timeout=10
                    )
                    print("   ✓ Milvus连接成功")
                    connections.disconnect("test")
                except Exception as e:
                    print(f"   ✗ Milvus连接失败: {e}")
                    
                    # 4. 检查服务状态
                    print("\n4. 检查服务状态...")
                    try:
                        result = subprocess.run(
                            ["kubectl", "get", "pods", "-n", "milvus"],
                            capture_output=True,
                            text=True
                        )
                        print(result.stdout)
                    except:
                        print("   无法检查Kubernetes状态")
                
                # 5. 检查防火墙
                print("\n5. 防火墙检查建议:")
                print("   - 检查iptables规则: sudo iptables -L")
                print("   - 检查firewalld: sudo firewall-cmd --list-all")
                print("   - 检查云安全组配置")
                
                # 6. 检查日志
                print("\n6. 查看日志:")
                print(f"   kubectl logs -n milvus <pod-name>")
                print(f"   或: docker logs milvus-standalone")
            
            # diagnose_connection("localhost", "19530")
            
            # 常见连接错误及解决方案
            connection_errors = {
                "connection refused": {
                    "原因": "服务未启动或端口未监听",
                    "解决方案": [
                        "检查Milvus服务状态",
                        "验证端口配置",
                        "查看服务日志"
                    ]
                },
                "timeout": {
                    "原因": "网络不通或服务响应慢",
                    "解决方案": [
                        "检查网络连通性",
                        "增加超时时间",
                        "检查服务负载"
                    ]
                },
                "authentication failed": {
                    "原因": "用户名或密码错误",
                    "解决方案": [
                        "验证认证信息",
                        "检查用户权限",
                        "重置密码"
                    ]
                },
                "DNS resolution failed": {
                    "原因": "域名无法解析",
                    "解决方案": [
                        "检查DNS配置",
                        "使用IP地址连接",
                        "检查hosts文件"
                    ]
                }
            }
            
            print("\n常见连接错误及解决方案:")
            for error, info in connection_errors.items():
                print(f"\n{error}:")
                print(f"  原因: {info['原因']}")
                print(f"  解决方案:")
                for solution in info['解决方案']:
                    print(f"    - {solution}")
            ---
    b.查询超时
        a.功能说明
            查询超时通常由性能问题引起。可能原因包括数据量过大、索引未优化、资源不足、并发过高等。检查查询参数配置。优化索引类型和参数。增加Query Node资源。调整超时时间。分析慢查询日志。实现查询限流。优化数据模型。
        b.代码示例
            ---
            # 查询超时故障排查
            
            from pymilvus import connections, Collection
            import time
            import numpy as np
            
            def diagnose_query_timeout(collection_name):
                """诊断查询超时问题"""
                connections.connect(host="localhost", port="19530")
                collection = Collection(collection_name)
                collection.load()
                
                print(f"诊断Collection: {collection_name}\n")
                
                # 1. 检查Collection信息
                print("1. Collection信息:")
                print(f"   数据量: {collection.num_entities}")
                print(f"   字段数: {len(collection.schema.fields)}")
                
                # 2. 检查索引
                print("\n2. 索引信息:")
                for field in collection.schema.fields:
                    if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                        index = collection.index(field.name)
                        print(f"   {field.name}:")
                        print(f"     类型: {index.params.get('index_type')}")
                        print(f"     参数: {index.params.get('params')}")
                
                # 3. 测试查询性能
                print("\n3. 查询性能测试:")
                
                test_cases = [
                    {"nprobe": 8, "limit": 10},
                    {"nprobe": 16, "limit": 10},
                    {"nprobe": 32, "limit": 10},
                    {"nprobe": 16, "limit": 100}
                ]
                
                query_vector = [[np.random.random() for _ in range(128)]]
                
                for params in test_cases:
                    start = time.time()
                    try:
                        results = collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": params},
                            limit=params["limit"],
                            timeout=30
                        )
                        latency = (time.time() - start) * 1000
                        print(f"   nprobe={params['nprobe']}, limit={params['limit']}: {latency:.2f}ms")
                    except Exception as e:
                        print(f"   nprobe={params['nprobe']}, limit={params['limit']}: 超时或失败 ({e})")
                
                # 4. 资源使用情况
                print("\n4. 资源使用建议:")
                print("   - 检查Query Node CPU/内存使用")
                print("   - 检查是否需要增加Query Node数量")
                print("   - 检查索引是否已加载到内存")
                
                # 5. 优化建议
                print("\n5. 优化建议:")
                
                if collection.num_entities > 10000000:
                    print("   - 数据量较大,考虑分片或分区")
                
                print("   - 优化索引参数(降低nprobe)")
                print("   - 增加Query Node资源")
                print("   - 使用更高效的索引类型(如HNSW)")
                print("   - 实现查询缓存")
                
                connections.disconnect("default")
            
            # diagnose_query_timeout("test_collection")
            
            # 查询超时优化方案
            optimization_strategies = {
                "索引优化": {
                    "FLAT -> IVF_FLAT": "适合中等规模数据",
                    "IVF_FLAT -> IVF_PQ": "牺牲精度换取速度",
                    "IVF -> HNSW": "更好的查询性能"
                },
                "参数调优": {
                    "降低nprobe": "减少搜索的聚类中心数量",
                    "降低limit": "减少返回结果数量",
                    "增加timeout": "给予更多查询时间"
                },
                "资源扩展": {
                    "增加Query Node": "提升并发查询能力",
                    "增加内存": "缓存更多索引数据",
                    "使用SSD": "加快数据加载速度"
                },
                "架构优化": {
                    "数据分区": "按业务逻辑分区数据",
                    "查询缓存": "缓存热门查询结果",
                    "异步查询": "使用异步API"
                }
            }
            
            print("\n查询超时优化方案:")
            for category, strategies in optimization_strategies.items():
                print(f"\n{category}:")
                for strategy, desc in strategies.items():
                    print(f"  {strategy}: {desc}")
            ---

02.性能问题
    a.性能分析
        a.功能说明
            系统性能下降需要全面分析。监控QPS、延迟、资源使用等指标。分析慢查询和热点数据。检查索引效率和数据分布。评估硬件资源是否充足。识别性能瓶颈所在。制定优化方案。实施性能测试验证效果。
        b.代码示例
            ---
            # 性能分析工具
            
            from pymilvus import connections, Collection, utility
            import time
            import numpy as np
            from collections import defaultdict
            
            class PerformanceAnalyzer:
                def __init__(self, host="localhost", port="19530"):
                    connections.connect(host=host, port=port)
                    self.metrics = defaultdict(list)
                
                def analyze_collection(self, collection_name):
                    """分析Collection性能"""
                    collection = Collection(collection_name)
                    collection.load()
                    
                    print(f"性能分析: {collection_name}\n")
                    
                    # 1. 基本信息
                    print("1. 基本信息:")
                    print(f"   数据量: {collection.num_entities:,}")
                    print(f"   字段数: {len(collection.schema.fields)}")
                    
                    # 2. 索引分析
                    print("\n2. 索引分析:")
                    for field in collection.schema.fields:
                        if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                            index = collection.index(field.name)
                            print(f"   {field.name}:")
                            print(f"     类型: {index.params.get('index_type')}")
                            print(f"     参数: {index.params.get('params')}")
                    
                    # 3. 查询性能测试
                    print("\n3. 查询性能测试:")
                    
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    # 测试不同参数组合
                    test_params = [
                        {"nprobe": 8, "limit": 10},
                        {"nprobe": 16, "limit": 10},
                        {"nprobe": 32, "limit": 10},
                    ]
                    
                    for params in test_params:
                        latencies = []
                        
                        # 多次测试取平均
                        for _ in range(10):
                            start = time.time()
                            collection.search(
                                data=query_vector,
                                anns_field="embedding",
                                param={"metric_type": "L2", "params": params},
                                limit=params["limit"]
                            )
                            latency = (time.time() - start) * 1000
                            latencies.append(latency)
                        
                        avg_latency = sum(latencies) / len(latencies)
                        p99_latency = sorted(latencies)[int(len(latencies) * 0.99)]
                        
                        print(f"   nprobe={params['nprobe']}:")
                        print(f"     平均延迟: {avg_latency:.2f}ms")
                        print(f"     P99延迟: {p99_latency:.2f}ms")
                        
                        self.metrics[f"nprobe_{params['nprobe']}"] = {
                            "avg": avg_latency,
                            "p99": p99_latency
                        }
                    
                    # 4. 并发性能测试
                    print("\n4. 并发性能测试:")
                    self.test_concurrent_queries(collection, threads=10)
                    
                    # 5. 性能评分
                    print("\n5. 性能评分:")
                    score = self.calculate_performance_score()
                    print(f"   总分: {score}/100")
                    
                    # 6. 优化建议
                    print("\n6. 优化建议:")
                    self.generate_recommendations(collection)
                
                def test_concurrent_queries(self, collection, threads=10):
                    """测试并发查询性能"""
                    import threading
                    
                    query_vector = [[np.random.random() for _ in range(128)]]
                    results = []
                    
                    def query_worker():
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": 16}},
                            limit=10
                        )
                        latency = (time.time() - start) * 1000
                        results.append(latency)
                    
                    # 启动并发查询
                    thread_list = []
                    start = time.time()
                    
                    for _ in range(threads):
                        t = threading.Thread(target=query_worker)
                        t.start()
                        thread_list.append(t)
                    
                    for t in thread_list:
                        t.join()
                    
                    total_time = (time.time() - start) * 1000
                    avg_latency = sum(results) / len(results)
                    
                    print(f"   并发数: {threads}")
                    print(f"   总耗时: {total_time:.2f}ms")
                    print(f"   平均延迟: {avg_latency:.2f}ms")
                    print(f"   QPS: {threads / (total_time / 1000):.2f}")
                
                def calculate_performance_score(self):
                    """计算性能评分"""
                    score = 100
                    
                    # 根据延迟扣分
                    avg_latency = self.metrics.get("nprobe_16", {}).get("avg", 0)
                    if avg_latency > 100:
                        score -= 20
                    elif avg_latency > 50:
                        score -= 10
                    
                    # 根据P99延迟扣分
                    p99_latency = self.metrics.get("nprobe_16", {}).get("p99", 0)
                    if p99_latency > 200:
                        score -= 20
                    elif p99_latency > 100:
                        score -= 10
                    
                    return max(score, 0)
                
                def generate_recommendations(self, collection):
                    """生成优化建议"""
                    recommendations = []
                    
                    # 检查数据量
                    if collection.num_entities > 10000000:
                        recommendations.append("数据量较大,建议使用分区")
                    
                    # 检查延迟
                    avg_latency = self.metrics.get("nprobe_16", {}).get("avg", 0)
                    if avg_latency > 100:
                        recommendations.append("查询延迟较高,建议优化索引或增加资源")
                    
                    # 检查索引
                    for field in collection.schema.fields:
                        if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                            index = collection.index(field.name)
                            index_type = index.params.get('index_type')
                            
                            if index_type == 'FLAT' and collection.num_entities > 100000:
                                recommendations.append(f"字段{field.name}使用FLAT索引,建议切换到IVF或HNSW")
                    
                    if not recommendations:
                        recommendations.append("性能良好,无需优化")
                    
                    for i, rec in enumerate(recommendations, 1):
                        print(f"   {i}. {rec}")
            
            # 使用性能分析器
            # analyzer = PerformanceAnalyzer()
            # analyzer.analyze_collection("test_collection")
            ---
    b.性能优化
        a.功能说明
            根据分析结果实施性能优化。优化索引类型和参数。调整查询参数。增加硬件资源。实现数据分区和负载均衡。优化数据模型。实现缓存机制。调整系统配置参数。验证优化效果。
        b.代码示例
            ---
            # 性能优化实施
            
            from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
            
            class PerformanceOptimizer:
                def __init__(self, host="localhost", port="19530"):
                    connections.connect(host=host, port=port)
                
                def optimize_index(self, collection_name, field_name):
                    """优化索引"""
                    collection = Collection(collection_name)
                    
                    print(f"优化索引: {collection_name}.{field_name}\n")
                    
                    # 1. 删除旧索引
                    print("1. 删除旧索引...")
                    collection.release()
                    collection.drop_index(field_name)
                    
                    # 2. 创建优化后的索引
                    print("2. 创建优化索引...")
                    
                    # 根据数据量选择索引类型
                    num_entities = collection.num_entities
                    
                    if num_entities < 100000:
                        # 小数据量使用FLAT
                        index_params = {
                            "index_type": "FLAT",
                            "metric_type": "L2"
                        }
                    elif num_entities < 1000000:
                        # 中等数据量使用IVF_FLAT
                        index_params = {
                            "index_type": "IVF_FLAT",
                            "metric_type": "L2",
                            "params": {"nlist": 1024}
                        }
                    else:
                        # 大数据量使用HNSW
                        index_params = {
                            "index_type": "HNSW",
                            "metric_type": "L2",
                            "params": {
                                "M": 16,
                                "efConstruction": 256
                            }
                        }
                    
                    collection.create_index(
                        field_name=field_name,
                        index_params=index_params
                    )
                    
                    print(f"   索引类型: {index_params['index_type']}")
                    print(f"   索引参数: {index_params.get('params', {})}")
                    
                    # 3. 加载索引
                    print("\n3. 加载索引...")
                    collection.load()
                    
                    print("索引优化完成!")
                
                def optimize_query_params(self, collection_name):
                    """优化查询参数"""
                    collection = Collection(collection_name)
                    collection.load()
                    
                    print(f"优化查询参数: {collection_name}\n")
                    
                    # 测试不同参数组合
                    import numpy as np
                    query_vector = [[np.random.random() for _ in range(128)]]
                    
                    best_params = None
                    best_score = 0
                    
                    for nprobe in [8, 16, 32, 64]:
                        latencies = []
                        
                        for _ in range(5):
                            start = time.time()
                            results = collection.search(
                                data=query_vector,
                                anns_field="embedding",
                                param={"metric_type": "L2", "params": {"nprobe": nprobe}},
                                limit=10
                            )
                            latency = (time.time() - start) * 1000
                            latencies.append(latency)
                        
                        avg_latency = sum(latencies) / len(latencies)
                        
                        # 计算得分(延迟越低越好)
                        score = 1000 / avg_latency
                        
                        print(f"nprobe={nprobe}: 平均延迟={avg_latency:.2f}ms, 得分={score:.2f}")
                        
                        if score > best_score:
                            best_score = score
                            best_params = {"nprobe": nprobe}
                    
                    print(f"\n推荐参数: {best_params}")
                    return best_params
                
                def implement_partitioning(self, collection_name, partition_field):
                    """实现数据分区"""
                    print(f"实现数据分区: {collection_name}\n")
                    
                    collection = Collection(collection_name)
                    
                    # 创建分区
                    partitions = ["partition_2023", "partition_2024", "partition_2025"]
                    
                    for partition_name in partitions:
                        if not collection.has_partition(partition_name):
                            collection.create_partition(partition_name)
                            print(f"创建分区: {partition_name}")
                    
                    print("\n分区创建完成!")
                    print("使用方法:")
                    print("  # 插入到指定分区")
                    print("  collection.insert(data, partition_name='partition_2024')")
                    print("  # 查询指定分区")
                    print("  collection.search(data, partition_names=['partition_2024'])")
            
            # 使用优化器
            # optimizer = PerformanceOptimizer()
            # optimizer.optimize_index("test_collection", "embedding")
            # optimizer.optimize_query_params("test_collection")
            # optimizer.implement_partitioning("test_collection", "year")
            
            print("性能优化工具使用示例已生成")
            ---

12 最佳实践

12.1 数据建模

01.Schema设计
    a.字段规划
        a.功能说明
            合理的Schema设计是高效使用Milvus的基础。规划字段类型和数量,避免冗余。向量字段选择合适的维度。标量字段用于过滤和元数据存储。主键字段必须唯一。考虑查询模式设计Schema。预留扩展空间。遵循最小化原则。
        b.代码示例
            ---
            from pymilvus import FieldSchema, CollectionSchema, DataType, Collection
            
            # 1. 基础Schema设计
            def create_basic_schema():
                """创建基础Schema"""
                fields = [
                    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
                    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
                    FieldSchema(name="timestamp", dtype=DataType.INT64),
                    FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100)
                ]
                
                schema = CollectionSchema(
                    fields=fields,
                    description="基础文档检索Schema"
                )
                
                return schema
            
            # 2. 多向量Schema设计
            def create_multimodal_schema():
                """创建多模态Schema"""
                fields = [
                    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
                    # 文本嵌入
                    FieldSchema(name="text_embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
                    # 图像嵌入
                    FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=512),
                    # 元数据
                    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=500),
                    FieldSchema(name="url", dtype=DataType.VARCHAR, max_length=1000),
                    FieldSchema(name="tags", dtype=DataType.VARCHAR, max_length=500),
                    FieldSchema(name="created_at", dtype=DataType.INT64)
                ]
                
                schema = CollectionSchema(
                    fields=fields,
                    description="多模态检索Schema"
                )
                
                return schema
            
            # 3. 电商推荐Schema
            def create_ecommerce_schema():
                """创建电商推荐Schema"""
                fields = [
                    FieldSchema(name="product_id", dtype=DataType.INT64, is_primary=True),
                    FieldSchema(name="product_embedding", dtype=DataType.FLOAT_VECTOR, dim=256),
                    FieldSchema(name="product_name", dtype=DataType.VARCHAR, max_length=200),
                    FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100),
                    FieldSchema(name="price", dtype=DataType.FLOAT),
                    FieldSchema(name="rating", dtype=DataType.FLOAT),
                    FieldSchema(name="stock", dtype=DataType.INT64),
                    FieldSchema(name="brand", dtype=DataType.VARCHAR, max_length=100),
                    FieldSchema(name="is_active", dtype=DataType.BOOL)
                ]
                
                schema = CollectionSchema(
                    fields=fields,
                    description="电商商品推荐Schema"
                )
                
                return schema
            
            # 4. Schema设计最佳实践
            schema_best_practices = {
                "字段数量": "保持在20个以内,避免过多字段影响性能",
                "向量维度": "根据模型选择,常见768/512/256/128",
                "VARCHAR长度": "根据实际需求设置,不要过大",
                "主键设计": "使用auto_id或业务ID,确保唯一性",
                "索引字段": "常用于过滤的字段建立标量索引",
                "数据类型": "选择合适的数据类型,节省存储空间"
            }
            
            print("Schema设计最佳实践:")
            for key, value in schema_best_practices.items():
                print(f"  {key}: {value}")
            
            # 5. Schema验证
            def validate_schema(schema):
                """验证Schema设计"""
                issues = []
                
                # 检查主键
                primary_fields = [f for f in schema.fields if f.is_primary]
                if len(primary_fields) == 0:
                    issues.append("缺少主键字段")
                elif len(primary_fields) > 1:
                    issues.append("存在多个主键字段")
                
                # 检查向量字段
                vector_fields = [f for f in schema.fields if f.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]]
                if len(vector_fields) == 0:
                    issues.append("缺少向量字段")
                
                # 检查字段数量
                if len(schema.fields) > 20:
                    issues.append(f"字段数量过多({len(schema.fields)}),建议少于20个")
                
                # 检查VARCHAR长度
                for field in schema.fields:
                    if field.dtype == DataType.VARCHAR:
                        if field.params.get("max_length", 0) > 65535:
                            issues.append(f"字段{field.name}的max_length过大")
                
                if issues:
                    print("Schema验证失败:")
                    for issue in issues:
                        print(f"  - {issue}")
                    return False
                else:
                    print("Schema验证通过")
                    return True
            
            # 测试Schema
            schema = create_basic_schema()
            validate_schema(schema)
            ---
    b.分区策略
        a.功能说明
            合理使用分区提升查询性能。按时间、类别、地域等维度分区。每个分区独立管理和查询。分区数量建议在4096以内。避免过多小分区。支持动态创建和删除分区。查询时指定分区减少扫描范围。实现数据生命周期管理。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            from datetime import datetime
            
            connections.connect(host="localhost", port="19530")
            
            # 1. 按时间分区
            def create_time_based_partitions(collection_name):
                """按时间创建分区"""
                collection = Collection(collection_name)
                
                # 按年份分区
                years = ["2023", "2024", "2025"]
                for year in years:
                    partition_name = f"year_{year}"
                    if not collection.has_partition(partition_name):
                        collection.create_partition(partition_name)
                        print(f"创建分区: {partition_name}")
                
                # 按月份分区(更细粒度)
                months = ["202401", "202402", "202403"]
                for month in months:
                    partition_name = f"month_{month}"
                    if not collection.has_partition(partition_name):
                        collection.create_partition(partition_name)
                        print(f"创建分区: {partition_name}")
            
            # 2. 按类别分区
            def create_category_partitions(collection_name, categories):
                """按类别创建分区"""
                collection = Collection(collection_name)
                
                for category in categories:
                    partition_name = f"cat_{category}"
                    if not collection.has_partition(partition_name):
                        collection.create_partition(partition_name)
                        print(f"创建分区: {partition_name}")
            
            # 使用示例
            # create_category_partitions("products", ["electronics", "clothing", "books"])
            
            # 3. 分区数据插入
            def insert_with_partition(collection, data, partition_key_field, partition_mapping):
                """根据字段值插入到对应分区"""
                # 按分区键分组数据
                partition_data = {}
                
                for i, value in enumerate(data[partition_key_field]):
                    partition_name = partition_mapping.get(value, "_default")
                    
                    if partition_name not in partition_data:
                        partition_data[partition_name] = {field: [] for field in data.keys()}
                    
                    for field, values in data.items():
                        partition_data[partition_name][field].append(values[i])
                
                # 插入到各分区
                for partition_name, pdata in partition_data.items():
                    collection.insert(pdata, partition_name=partition_name)
                    print(f"插入{len(pdata[partition_key_field])}条数据到分区: {partition_name}")
            
            # 4. 分区查询
            def search_with_partitions(collection, query_vector, partition_names=None):
                """在指定分区中查询"""
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10,
                    partition_names=partition_names  # 指定分区
                )
                
                return results
            
            # 查询示例
            # import numpy as np
            # query_vec = [np.random.random() for _ in range(128)]
            # results = search_with_partitions(collection, query_vec, partition_names=["year_2024"])
            
            # 5. 分区管理
            class PartitionManager:
                def __init__(self, collection):
                    self.collection = collection
                
                def list_partitions(self):
                    """列出所有分区"""
                    partitions = self.collection.partitions
                    print(f"分区数量: {len(partitions)}")
                    
                    for partition in partitions:
                        print(f"  {partition.name}: {partition.num_entities} entities")
                
                def drop_old_partitions(self, keep_count=12):
                    """删除旧分区,保留最近N个"""
                    partitions = sorted(
                        [p for p in self.collection.partitions if p.name != "_default"],
                        key=lambda p: p.name
                    )
                    
                    if len(partitions) > keep_count:
                        to_drop = partitions[:-keep_count]
                        for partition in to_drop:
                            self.collection.drop_partition(partition.name)
                            print(f"删除分区: {partition.name}")
                
                def merge_partitions(self, source_partitions, target_partition):
                    """合并多个分区"""
                    # 从源分区查询所有数据
                    all_data = []
                    for partition_name in source_partitions:
                        data = self.collection.query(
                            expr="",
                            partition_names=[partition_name],
                            output_fields=["*"]
                        )
                        all_data.extend(data)
                    
                    # 插入到目标分区
                    if not self.collection.has_partition(target_partition):
                        self.collection.create_partition(target_partition)
                    
                    # 转换数据格式
                    insert_data = {}
                    for field in self.collection.schema.fields:
                        insert_data[field.name] = [item[field.name] for item in all_data]
                    
                    self.collection.insert(insert_data, partition_name=target_partition)
                    
                    # 删除源分区
                    for partition_name in source_partitions:
                        self.collection.drop_partition(partition_name)
                    
                    print(f"合并{len(source_partitions)}个分区到: {target_partition}")
            
            # 使用分区管理器
            # collection = Collection("test_collection")
            # manager = PartitionManager(collection)
            # manager.list_partitions()
            # manager.drop_old_partitions(keep_count=12)
            
            # 6. 分区策略建议
            partition_strategies = {
                "时间分区": {
                    "适用场景": "日志、事件、时序数据",
                    "优点": "便于数据归档和清理",
                    "缺点": "可能导致热点分区",
                    "建议": "按月或季度分区,避免过细粒度"
                },
                "类别分区": {
                    "适用场景": "电商、内容分类",
                    "优点": "查询时可精确定位分区",
                    "缺点": "类别变化时需要调整",
                    "建议": "使用稳定的一级分类"
                },
                "哈希分区": {
                    "适用场景": "数据均匀分布",
                    "优点": "负载均衡",
                    "缺点": "无法按业务逻辑查询",
                    "建议": "结合其他策略使用"
                }
            }
            
            print("\n分区策略建议:")
            for strategy, info in partition_strategies.items():
                print(f"\n{strategy}:")
                for key, value in info.items():
                    print(f"  {key}: {value}")
            ---

02.数据质量
    a.数据清洗
        a.功能说明
            高质量的数据是准确检索的前提。清洗重复数据和异常值。标准化向量数据格式。验证向量维度一致性。处理缺失值和空值。过滤低质量数据。实现数据验证流程。记录数据质量指标。
        b.代码示例
            ---
            import numpy as np
            from pymilvus import Collection, connections
            
            class DataCleaner:
                def __init__(self):
                    self.stats = {
                        "total": 0,
                        "duplicates": 0,
                        "invalid_vectors": 0,
                        "missing_fields": 0,
                        "cleaned": 0
                    }
                
                def clean_vectors(self, vectors, dim=768):
                    """清洗向量数据"""
                    cleaned = []
                    
                    for vec in vectors:
                        # 检查维度
                        if len(vec) != dim:
                            self.stats["invalid_vectors"] += 1
                            continue
                        
                        # 检查NaN和Inf
                        if np.isnan(vec).any() or np.isinf(vec).any():
                            self.stats["invalid_vectors"] += 1
                            continue
                        
                        # 标准化
                        vec = np.array(vec, dtype=np.float32)
                        
                        # L2归一化
                        norm = np.linalg.norm(vec)
                        if norm > 0:
                            vec = vec / norm
                        
                        cleaned.append(vec.tolist())
                        self.stats["cleaned"] += 1
                    
                    return cleaned
                
                def remove_duplicates(self, data, id_field="id"):
                    """去除重复数据"""
                    seen_ids = set()
                    cleaned_data = {field: [] for field in data.keys()}
                    
                    for i in range(len(data[id_field])):
                        item_id = data[id_field][i]
                        
                        if item_id in seen_ids:
                            self.stats["duplicates"] += 1
                            continue
                        
                        seen_ids.add(item_id)
                        
                        for field, values in data.items():
                            cleaned_data[field].append(values[i])
                    
                    return cleaned_data
                
                def validate_data(self, data, schema):
                    """验证数据完整性"""
                    self.stats["total"] = len(data[list(data.keys())[0]])
                    
                    # 检查必填字段
                    for field in schema.fields:
                        if field.name not in data:
                            print(f"缺少字段: {field.name}")
                            return False
                        
                        # 检查数据长度一致性
                        if len(data[field.name]) != self.stats["total"]:
                            print(f"字段{field.name}数据长度不一致")
                            return False
                        
                        # 检查空值
                        if field.dtype == DataType.VARCHAR:
                            empty_count = sum(1 for v in data[field.name] if not v)
                            if empty_count > 0:
                                print(f"字段{field.name}有{empty_count}个空值")
                                self.stats["missing_fields"] += empty_count
                    
                    return True
                
                def get_stats(self):
                    """获取清洗统计"""
                    return self.stats
            
            # 使用数据清洗器
            cleaner = DataCleaner()
            
            # 示例数据
            raw_data = {
                "id": [1, 2, 2, 3, 4],  # 包含重复
                "embedding": [
                    [0.1] * 768,
                    [0.2] * 768,
                    [0.2] * 768,
                    [float('nan')] * 768,  # 包含NaN
                    [0.4] * 768
                ],
                "text": ["doc1", "doc2", "doc2", "", "doc4"]
            }
            
            # 清洗向量
            cleaned_vectors = cleaner.clean_vectors(raw_data["embedding"])
            raw_data["embedding"] = cleaned_vectors
            
            # 去重
            cleaned_data = cleaner.remove_duplicates(raw_data)
            
            # 输出统计
            stats = cleaner.get_stats()
            print("数据清洗统计:")
            print(f"  总数: {stats['total']}")
            print(f"  重复: {stats['duplicates']}")
            print(f"  无效向量: {stats['invalid_vectors']}")
            print(f"  缺失字段: {stats['missing_fields']}")
            print(f"  清洗后: {stats['cleaned']}")
            ---
    b.数据验证
        a.功能说明
            建立数据验证机制确保数据质量。验证数据格式和类型。检查向量维度和范围。验证主键唯一性。检查标量字段合法性。实现自动化验证流程。记录验证结果和异常。提供数据质量报告。
        b.代码示例
            ---
            from pymilvus import Collection, DataType
            import numpy as np
            
            class DataValidator:
                def __init__(self, schema):
                    self.schema = schema
                    self.errors = []
                
                def validate_batch(self, data):
                    """验证批量数据"""
                    self.errors = []
                    
                    # 1. 验证字段完整性
                    if not self._validate_fields(data):
                        return False
                    
                    # 2. 验证数据类型
                    if not self._validate_types(data):
                        return False
                    
                    # 3. 验证向量数据
                    if not self._validate_vectors(data):
                        return False
                    
                    # 4. 验证主键唯一性
                    if not self._validate_primary_key(data):
                        return False
                    
                    # 5. 验证VARCHAR长度
                    if not self._validate_varchar_length(data):
                        return False
                    
                    return len(self.errors) == 0
                
                def _validate_fields(self, data):
                    """验证字段完整性"""
                    for field in self.schema.fields:
                        if field.name not in data:
                            self.errors.append(f"缺少字段: {field.name}")
                            return False
                    
                    # 检查数据长度一致性
                    lengths = [len(values) for values in data.values()]
                    if len(set(lengths)) > 1:
                        self.errors.append(f"字段数据长度不一致: {lengths}")
                        return False
                    
                    return True
                
                def _validate_types(self, data):
                    """验证数据类型"""
                    for field in self.schema.fields:
                        values = data[field.name]
                        
                        if field.dtype == DataType.INT64:
                            if not all(isinstance(v, (int, np.integer)) for v in values):
                                self.errors.append(f"字段{field.name}类型错误,期望INT64")
                                return False
                        
                        elif field.dtype == DataType.FLOAT:
                            if not all(isinstance(v, (float, np.floating, int)) for v in values):
                                self.errors.append(f"字段{field.name}类型错误,期望FLOAT")
                                return False
                        
                        elif field.dtype == DataType.VARCHAR:
                            if not all(isinstance(v, str) for v in values):
                                self.errors.append(f"字段{field.name}类型错误,期望VARCHAR")
                                return False
                        
                        elif field.dtype == DataType.BOOL:
                            if not all(isinstance(v, bool) for v in values):
                                self.errors.append(f"字段{field.name}类型错误,期望BOOL")
                                return False
                    
                    return True
                
                def _validate_vectors(self, data):
                    """验证向量数据"""
                    for field in self.schema.fields:
                        if field.dtype in [DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR]:
                            vectors = data[field.name]
                            expected_dim = field.params["dim"]
                            
                            for i, vec in enumerate(vectors):
                                # 检查维度
                                if len(vec) != expected_dim:
                                    self.errors.append(
                                        f"字段{field.name}第{i}个向量维度错误: "
                                        f"期望{expected_dim}, 实际{len(vec)}"
                                    )
                                    return False
                                
                                # 检查NaN和Inf
                                vec_array = np.array(vec)
                                if np.isnan(vec_array).any():
                                    self.errors.append(f"字段{field.name}第{i}个向量包含NaN")
                                    return False
                                
                                if np.isinf(vec_array).any():
                                    self.errors.append(f"字段{field.name}第{i}个向量包含Inf")
                                    return False
                    
                    return True
                
                def _validate_primary_key(self, data):
                    """验证主键唯一性"""
                    for field in self.schema.fields:
                        if field.is_primary:
                            ids = data[field.name]
                            
                            if len(ids) != len(set(ids)):
                                duplicates = [id for id in ids if ids.count(id) > 1]
                                self.errors.append(f"主键{field.name}存在重复值: {set(duplicates)}")
                                return False
                    
                    return True
                
                def _validate_varchar_length(self, data):
                    """验证VARCHAR长度"""
                    for field in self.schema.fields:
                        if field.dtype == DataType.VARCHAR:
                            max_length = field.params.get("max_length", 65535)
                            values = data[field.name]
                            
                            for i, value in enumerate(values):
                                if len(value) > max_length:
                                    self.errors.append(
                                        f"字段{field.name}第{i}个值超长: "
                                        f"{len(value)} > {max_length}"
                                    )
                                    return False
                    
                    return True
                
                def get_errors(self):
                    """获取验证错误"""
                    return self.errors
            
            # 使用数据验证器
            from pymilvus import FieldSchema, CollectionSchema
            
            # 创建Schema
            fields = [
                FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
                FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128),
                FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000)
            ]
            schema = CollectionSchema(fields=fields)
            
            # 验证数据
            validator = DataValidator(schema)
            
            test_data = {
                "id": [1, 2, 3],
                "embedding": [
                    [0.1] * 128,
                    [0.2] * 128,
                    [0.3] * 128
                ],
                "text": ["doc1", "doc2", "doc3"]
            }
            
            if validator.validate_batch(test_data):
                print("数据验证通过")
            else:
                print("数据验证失败:")
                for error in validator.get_errors():
                    print(f"  - {error}")
            ---

12.2 索引选择

01.索引类型
    a.FLAT索引
        a.功能说明
            FLAT索引是最简单的索引类型,不进行任何压缩或近似。适合小规模数据集(<10万向量)。提供100%召回率,结果最准确。查询速度随数据量线性增长。不需要训练过程,创建速度快。内存占用等于原始向量大小。适合对准确性要求极高的场景。作为其他索引的基准对比。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            
            connections.connect(host="localhost", port="19530")
            collection = Collection("test_collection")
            
            # 创建FLAT索引
            index_params = {
                "index_type": "FLAT",
                "metric_type": "L2"
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=index_params
            )
            
            print("FLAT索引特点:")
            print("  适用场景: 小规模数据(<10万)")
            print("  召回率: 100%")
            print("  查询速度: 慢(线性扫描)")
            print("  内存占用: 高(等于原始数据)")
            print("  构建时间: 快(无需训练)")
            ---
    b.IVF索引
        a.功能说明
            IVF(Inverted File)索引通过聚类加速检索。将向量空间划分为nlist个聚类中心。查询时只搜索nprobe个最近的聚类。适合中大规模数据集(10万-1000万)。需要训练过程确定聚类中心。支持IVF_FLAT、IVF_SQ8、IVF_PQ等变体。平衡准确性和性能。是最常用的索引类型。
        b.代码示例
            ---
            # IVF_FLAT索引
            ivf_flat_params = {
                "index_type": "IVF_FLAT",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            collection.create_index(
                field_name="embedding",
                index_params=ivf_flat_params
            )
            
            # 查询参数
            search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
            
            # IVF_SQ8索引(标量量化)
            ivf_sq8_params = {
                "index_type": "IVF_SQ8",
                "metric_type": "L2",
                "params": {"nlist": 1024}
            }
            
            # IVF_PQ索引(乘积量化)
            ivf_pq_params = {
                "index_type": "IVF_PQ",
                "metric_type": "L2",
                "params": {
                    "nlist": 1024,
                    "m": 8,
                    "nbits": 8
                }
            }
            
            print("IVF索引对比:")
            print("  IVF_FLAT: 准确度高,内存占用大")
            print("  IVF_SQ8: 内存占用减少75%,准确度略降")
            print("  IVF_PQ: 内存占用最小,准确度进一步降低")
            ---

02.参数调优
    a.nlist参数
        a.功能说明
            nlist是IVF索引的聚类中心数量。影响索引构建时间和查询性能。nlist越大,聚类越细,查询越快但构建越慢。推荐值:sqrt(N)到4*sqrt(N),N为向量数量。常见取值:128、256、512、1024、2048。需要根据数据规模和查询需求调整。过大会增加内存占用,过小会降低查询性能。
        b.代码示例
            ---
            import math
            
            def recommend_nlist(num_vectors):
                """推荐nlist参数"""
                sqrt_n = int(math.sqrt(num_vectors))
                
                recommendations = {
                    "保守": sqrt_n,
                    "推荐": 2 * sqrt_n,
                    "激进": 4 * sqrt_n
                }
                
                for key in recommendations:
                    recommendations[key] = min(max(recommendations[key], 128), 65536)
                
                return recommendations
            
            test_sizes = [10000, 100000, 1000000, 10000000]
            
            print("nlist参数推荐:")
            for size in test_sizes:
                recs = recommend_nlist(size)
                print(f"\n数据量: {size:,}")
                for level, value in recs.items():
                    print(f"  {level}: {value}")
            ---
    b.nprobe参数
        a.功能说明
            nprobe是查询时搜索的聚类中心数量。影响查询准确度和速度。nprobe越大,准确度越高但速度越慢。推荐值:nlist的1%-10%。常见取值:8、16、32、64。需要在准确度和性能间平衡。可以根据业务需求动态调整。建议通过实验确定最优值。
        b.代码示例
            ---
            def recommend_nprobe(nlist, accuracy_requirement="medium"):
                """推荐nprobe参数"""
                recommendations = {
                    "low": max(int(nlist * 0.01), 8),
                    "medium": max(int(nlist * 0.05), 16),
                    "high": max(int(nlist * 0.10), 32)
                }
                
                return recommendations.get(accuracy_requirement, 16)
            
            nlist_values = [128, 512, 1024, 2048]
            
            print("nprobe参数推荐:")
            for nlist in nlist_values:
                print(f"\nnlist={nlist}:")
                for level in ["low", "medium", "high"]:
                    nprobe = recommend_nprobe(nlist, level)
                    print(f"  {level}: {nprobe}")
            
            import time
            import numpy as np
            
            def benchmark_nprobe(collection, nprobe_values):
                """测试不同nprobe的性能"""
                query_vector = [[np.random.random() for _ in range(128)]]
                
                results = {}
                for nprobe in nprobe_values:
                    latencies = []
                    
                    for _ in range(10):
                        start = time.time()
                        collection.search(
                            data=query_vector,
                            anns_field="embedding",
                            param={"metric_type": "L2", "params": {"nprobe": nprobe}},
                            limit=10
                        )
                        latency = (time.time() - start) * 1000
                        latencies.append(latency)
                    
                    results[nprobe] = {
                        "avg": sum(latencies) / len(latencies),
                        "p99": sorted(latencies)[int(len(latencies) * 0.99)]
                    }
                
                return results
            ---

12.3 查询优化

01.查询策略
    a.批量查询
        a.功能说明
            批量查询可以显著提升吞吐量。一次查询多个向量,减少网络开销。Milvus支持批量查询,自动并行处理。适合离线批处理场景。可以提升10-100倍吞吐量。需要平衡批量大小和延迟。建议批量大小:10-1000。实现异步批量查询进一步提升性能。
        b.代码示例
            ---
            from pymilvus import Collection, connections
            import numpy as np
            import time
            
            connections.connect(host="localhost", port="19530")
            collection = Collection("test_collection")
            collection.load()
            
            # 1. 单个查询基准测试
            def single_query_benchmark(collection, num_queries=100):
                """单个查询基准测试"""
                start = time.time()
                
                for _ in range(num_queries):
                    query_vector = [[np.random.random() for _ in range(128)]]
                    collection.search(
                        data=query_vector,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                
                elapsed = time.time() - start
                qps = num_queries / elapsed
                
                print(f"单个查询:")
                print(f"  总耗时: {elapsed:.2f}s")
                print(f"  QPS: {qps:.2f}")
                
                return qps
            
            # 2. 批量查询基准测试
            def batch_query_benchmark(collection, num_queries=100, batch_size=10):
                """批量查询基准测试"""
                start = time.time()
                
                for i in range(0, num_queries, batch_size):
                    batch_vectors = [
                        [np.random.random() for _ in range(128)]
                        for _ in range(min(batch_size, num_queries - i))
                    ]
                    
                    collection.search(
                        data=batch_vectors,
                        anns_field="embedding",
                        param={"metric_type": "L2", "params": {"nprobe": 16}},
                        limit=10
                    )
                
                elapsed = time.time() - start
                qps = num_queries / elapsed
                
                print(f"\n批量查询(batch_size={batch_size}):")
                print(f"  总耗时: {elapsed:.2f}s")
                print(f"  QPS: {qps:.2f}")
                
                return qps
            
            # 3. 对比测试
            print("查询性能对比:\n")
            single_qps = single_query_benchmark(collection, 100)
            
            for batch_size in [10, 50, 100]:
                batch_qps = batch_query_benchmark(collection, 100, batch_size)
                speedup = batch_qps / single_qps
                print(f"  加速比: {speedup:.2f}x")
            ---
    b.过滤优化
        a.功能说明
            合理使用过滤条件提升查询效率。在向量检索前先过滤,减少搜索范围。使用标量索引加速过滤。避免复杂的过滤表达式。优先使用等值过滤和范围过滤。组合多个过滤条件时注意顺序。使用分区代替过滤提升性能。
        b.代码示例
            ---
            # 1. 基础过滤
            def search_with_filter(collection, query_vector, filter_expr):
                """带过滤的查询"""
                results = collection.search(
                    data=[query_vector],
                    anns_field="embedding",
                    param={"metric_type": "L2", "params": {"nprobe": 16}},
                    limit=10,
                    expr=filter_expr
                )
                
                return results
            
            # 等值过滤
            results = search_with_filter(
                collection,
                [np.random.random() for _ in range(128)],
                'category == "electronics"'
            )
            
            # 范围过滤
            results = search_with_filter(
                collection,
                [np.random.random() for _ in range(128)],
                'price >= 100 and price <= 500'
            )
            
            # 2. 使用标量索引
            collection.create_index(
                field_name="category",
                index_params={"index_type": "STL_SORT"}
            )
            
            collection.create_index(
                field_name="price",
                index_params={"index_type": "STL_SORT"}
            )
            
            # 3. 分区代替过滤
            categories = ["electronics", "clothing", "books"]
            for cat in categories:
                if not collection.has_partition(f"cat_{cat}"):
                    collection.create_partition(f"cat_{cat}")
            
            results = collection.search(
                data=[[np.random.random() for _ in range(128)]],
                anns_field="embedding",
                param={"metric_type": "L2", "params": {"nprobe": 16}},
                limit=10,
                partition_names=["cat_electronics"]
            )
            
            print("过滤优化建议:")
            print("  1. 使用标量索引加速过滤")
            print("  2. 优化过滤条件顺序")
            print("  3. 使用分区代替过滤")
            print("  4. 避免复杂的表达式")
            ---

02.缓存策略
    a.结果缓存
        a.功能说明
            缓存热门查询结果提升响应速度。适合查询重复率高的场景。使用Redis或内存缓存。设置合理的缓存过期时间。实现缓存预热和更新策略。监控缓存命中率。平衡缓存大小和命中率。实现多级缓存提升性能。
        b.代码示例
            ---
            import redis
            import json
            import hashlib
            
            class QueryCache:
                def __init__(self, redis_host="localhost", redis_port=6379, ttl=3600):
                    self.redis_client = redis.Redis(host=redis_host, port=redis_port, decode_responses=True)
                    self.ttl = ttl
                    self.stats = {"hits": 0, "misses": 0}
                
                def _generate_key(self, query_vector, params):
                    """生成缓存键"""
                    data = {
                        "vector": query_vector,
                        "params": params
                    }
                    data_str = json.dumps(data, sort_keys=True)
                    key = hashlib.md5(data_str.encode()).hexdigest()
                    return f"milvus:query:{key}"
                
                def get(self, query_vector, params):
                    """获取缓存结果"""
                    key = self._generate_key(query_vector, params)
                    cached = self.redis_client.get(key)
                    
                    if cached:
                        self.stats["hits"] += 1
                        return json.loads(cached)
                    else:
                        self.stats["misses"] += 1
                        return None
                
                def set(self, query_vector, params, results):
                    """设置缓存"""
                    key = self._generate_key(query_vector, params)
                    
                    results_data = [
                        {
                            "id": r.id,
                            "distance": r.distance,
                            "entity": r.entity
                        }
                        for r in results[0]
                    ]
                    
                    self.redis_client.setex(
                        key,
                        self.ttl,
                        json.dumps(results_data)
                    )
                
                def search_with_cache(self, collection, query_vector, params):
                    """带缓存的查询"""
                    cached_results = self.get(query_vector, params)
                    
                    if cached_results:
                        return cached_results
                    
                    results = collection.search(
                        data=[query_vector],
                        anns_field="embedding",
                        param=params,
                        limit=10
                    )
                    
                    self.set(query_vector, params, results)
                    
                    return results
                
                def get_stats(self):
                    """获取缓存统计"""
                    total = self.stats["hits"] + self.stats["misses"]
                    hit_rate = self.stats["hits"] / total if total > 0 else 0
                    
                    return {
                        "hits": self.stats["hits"],
                        "misses": self.stats["misses"],
                        "hit_rate": hit_rate
                    }
            
            cache = QueryCache(ttl=3600)
            ---
    b.向量缓存
        a.功能说明
            缓存常用向量数据减少加载时间。将热点向量保存在内存。使用LRU策略管理缓存。预加载常用数据到缓存。监控缓存使用情况。实现缓存预热机制。平衡缓存大小和性能。
        b.代码示例
            ---
            from collections import OrderedDict
            import numpy as np
            
            class VectorCache:
                def __init__(self, max_size=10000):
                    self.cache = OrderedDict()
                    self.max_size = max_size
                    self.stats = {"hits": 0, "misses": 0}
                
                def get(self, vector_id):
                    """获取向量"""
                    if vector_id in self.cache:
                        self.cache.move_to_end(vector_id)
                        self.stats["hits"] += 1
                        return self.cache[vector_id]
                    else:
                        self.stats["misses"] += 1
                        return None
                
                def put(self, vector_id, vector):
                    """存入向量"""
                    if vector_id in self.cache:
                        self.cache.move_to_end(vector_id)
                    else:
                        if len(self.cache) >= self.max_size:
                            self.cache.popitem(last=False)
                        
                        self.cache[vector_id] = vector
                
                def batch_put(self, vectors_dict):
                    """批量存入"""
                    for vid, vec in vectors_dict.items():
                        self.put(vid, vec)
                
                def preload(self, collection, vector_ids):
                    """预加载向量"""
                    results = collection.query(
                        expr=f"id in {vector_ids}",
                        output_fields=["id", "embedding"]
                    )
                    
                    for result in results:
                        self.put(result["id"], result["embedding"])
                    
                    print(f"预加载{len(results)}个向量到缓存")
                
                def get_stats(self):
                    """获取统计信息"""
                    total = self.stats["hits"] + self.stats["misses"]
                    hit_rate = self.stats["hits"] / total if total > 0 else 0
                    
                    return {
                        "size": len(self.cache),
                        "max_size": self.max_size,
                        "hits": self.stats["hits"],
                        "misses": self.stats["misses"],
                        "hit_rate": hit_rate
                    }
            
            vector_cache = VectorCache(max_size=10000)
            ---

12.4 生产部署

01.部署架构
    a.单机部署
        a.功能说明
            单机部署适合开发测试和小规模应用。所有组件运行在一台服务器。使用Docker Compose快速部署。资源需求:8核16GB内存起。支持数百万级向量。部署简单,维护成本低。不支持高可用和水平扩展。适合POC和小型项目。
        b.代码示例
            ---
            # Docker Compose单机部署配置
            
            print("单机部署步骤:")
            print("1. 创建docker-compose.yml文件")
            print("2. 配置etcd、minio、milvus服务")
            print("3. 执行: docker-compose up -d")
            print("4. 验证: docker-compose ps")
            print("5. 查看日志: docker-compose logs -f")
            
            # 资源需求
            resource_requirements = {
                "CPU": "8核以上",
                "内存": "16GB以上",
                "存储": "SSD 100GB以上",
                "网络": "千兆网卡",
                "适用规模": "< 500万向量"
            }
            
            print("\n资源需求:")
            for key, value in resource_requirements.items():
                print(f"  {key}: {value}")
            
            # 单机部署优缺点
            pros_cons = {
                "优点": [
                    "部署简单快速",
                    "维护成本低",
                    "适合开发测试",
                    "无需复杂配置"
                ],
                "缺点": [
                    "不支持高可用",
                    "无法水平扩展",
                    "性能受限于单机",
                    "存在单点故障"
                ]
            }
            
            print("\n优缺点分析:")
            for category, items in pros_cons.items():
                print(f"{category}:")
                for item in items:
                    print(f"  - {item}")
            ---
    b.集群部署
        a.功能说明
            集群部署适合生产环境和大规模应用。组件分布式部署,支持水平扩展。使用Kubernetes编排管理。支持高可用和故障转移。可扩展到数十亿级向量。需要专业运维团队。适合企业级应用。
        b.代码示例
            ---
            # Kubernetes集群部署
            
            print("Kubernetes集群部署步骤:")
            print("1. 添加Milvus Helm仓库")
            print("   helm repo add milvus https://milvus-io.github.io/milvus-helm/")
            print("2. 创建命名空间")
            print("   kubectl create namespace milvus")
            print("3. 准备values.yaml配置文件")
            print("4. 安装Milvus")
            print("   helm install milvus milvus/milvus -n milvus -f values.yaml")
            print("5. 验证部署")
            print("   kubectl get pods -n milvus")
            
            # 集群组件说明
            cluster_components = {
                "Proxy": "接收客户端请求,路由到相应节点",
                "Query Node": "执行向量检索,可水平扩展",
                "Data Node": "处理数据写入和持久化",
                "Index Node": "构建和管理索引",
                "Root Coord": "集群协调和元数据管理",
                "Query Coord": "查询任务调度和负载均衡",
                "Data Coord": "数据分片和副本管理",
                "Index Coord": "索引构建任务调度"
            }
            
            print("\n集群组件:")
            for component, desc in cluster_components.items():
                print(f"  {component}: {desc}")
            
            # 集群配置建议
            cluster_config = {
                "Query Node": {
                    "副本数": "2-4",
                    "CPU": "4核/节点",
                    "内存": "8GB/节点"
                },
                "Data Node": {
                    "副本数": "2-3",
                    "CPU": "2核/节点",
                    "内存": "4GB/节点"
                },
                "Index Node": {
                    "副本数": "1-2",
                    "CPU": "4核/节点",
                    "内存": "8GB/节点"
                },
                "Proxy": {
                    "副本数": "2-3",
                    "CPU": "2核/节点",
                    "内存": "4GB/节点"
                }
            }
            
            print("\n集群配置建议:")
            for component, config in cluster_config.items():
                print(f"{component}:")
                for key, value in config.items():
                    print(f"  {key}: {value}")
            ---

02.运维管理
    a.监控告警
        a.功能说明
            建立完善的监控告警体系。监控服务健康状态和性能指标。使用Prometheus+Grafana可视化。配置告警规则和通知渠道。监控资源使用情况。跟踪查询性能和错误率。实现自动化运维。定期检查和优化。
        b.代码示例
            ---
            # 监控指标说明
            
            monitoring_metrics = {
                "性能指标": {
                    "QPS": "每秒查询数",
                    "查询延迟": "P50/P99延迟",
                    "吞吐量": "数据写入速率",
                    "索引构建速度": "向量/秒"
                },
                "资源指标": {
                    "CPU使用率": "各组件CPU占用",
                    "内存使用率": "各组件内存占用",
                    "磁盘使用率": "存储空间占用",
                    "网络流量": "入站/出站流量"
                },
                "业务指标": {
                    "向量数量": "Collection中的向量总数",
                    "查询成功率": "成功查询/总查询",
                    "错误率": "错误查询/总查询",
                    "缓存命中率": "缓存命中/总查询"
                }
            }
            
            print("监控指标体系:")
            for category, metrics in monitoring_metrics.items():
                print(f"\n{category}:")
                for metric, desc in metrics.items():
                    print(f"  {metric}: {desc}")
            
            # 告警规则
            alert_rules = [
                {
                    "名称": "查询延迟过高",
                    "条件": "P99延迟 > 100ms",
                    "级别": "Warning",
                    "持续时间": "5分钟"
                },
                {
                    "名称": "错误率过高",
                    "条件": "错误率 > 5%",
                    "级别": "Critical",
                    "持续时间": "5分钟"
                },
                {
                    "名称": "内存使用率过高",
                    "条件": "内存使用率 > 90%",
                    "级别": "Warning",
                    "持续时间": "5分钟"
                },
                {
                    "名称": "服务不可用",
                    "条件": "服务健康检查失败",
                    "级别": "Critical",
                    "持续时间": "1分钟"
                }
            ]
            
            print("\n告警规则:")
            for rule in alert_rules:
                print(f"\n{rule['名称']}:")
                print(f"  条件: {rule['条件']}")
                print(f"  级别: {rule['级别']}")
                print(f"  持续时间: {rule['持续时间']}")
            
            # Grafana仪表板
            dashboard_panels = [
                "QPS趋势图",
                "查询延迟分布",
                "CPU使用率",
                "内存使用率",
                "磁盘IO",
                "网络流量",
                "错误率",
                "向量数量"
            ]
            
            print("\nGrafana仪表板面板:")
            for i, panel in enumerate(dashboard_panels, 1):
                print(f"  {i}. {panel}")
            ---
    b.容量规划
        a.功能说明
            合理规划资源容量确保系统稳定。评估数据规模和增长趋势。计算存储、内存、CPU需求。预留30%-50%冗余空间。考虑峰值负载和突发流量。制定扩容策略和时间表。监控资源使用趋势。定期评估和调整。
        b.代码示例
            ---
            # 容量规划计算器
            
            class CapacityPlanner:
                def __init__(self):
                    self.index_overhead = 1.2
                    self.redundancy = 1.5
                
                def calculate_storage(self, num_vectors, vector_dim, dtype="float32"):
                    """计算存储需求"""
                    bytes_per_element = {
                        "float32": 4,
                        "float16": 2,
                        "int8": 1
                    }
                    
                    vector_size = num_vectors * vector_dim * bytes_per_element[dtype]
                    total_size = vector_size * self.index_overhead
                    required_size = total_size * self.redundancy
                    
                    return {
                        "vector_size_gb": vector_size / (1024**3),
                        "with_index_gb": total_size / (1024**3),
                        "required_gb": required_size / (1024**3)
                    }
                
                def calculate_memory(self, num_vectors, vector_dim, index_type="IVF_FLAT"):
                    """计算内存需求"""
                    vector_memory = num_vectors * vector_dim * 4
                    
                    index_overhead = {
                        "FLAT": 1.0,
                        "IVF_FLAT": 1.1,
                        "IVF_SQ8": 0.35,
                        "IVF_PQ": 0.15,
                        "HNSW": 1.5
                    }
                    
                    total_memory = vector_memory * index_overhead.get(index_type, 1.0)
                    required_memory = total_memory * 1.5
                    
                    return {
                        "vector_memory_gb": vector_memory / (1024**3),
                        "total_memory_gb": total_memory / (1024**3),
                        "required_gb": required_memory / (1024**3)
                    }
                
                def calculate_qps_capacity(self, num_query_nodes, cpu_per_node, latency_target_ms=50):
                    """计算QPS容量"""
                    qps_per_core = 1000 / latency_target_ms
                    total_qps = num_query_nodes * cpu_per_node * qps_per_core
                    safe_qps = total_qps * 0.7
                    
                    return {
                        "theoretical_qps": total_qps,
                        "safe_qps": safe_qps
                    }
                
                def generate_plan(self, num_vectors, vector_dim, qps_requirement, index_type="IVF_FLAT"):
                    """生成容量规划方案"""
                    storage = self.calculate_storage(num_vectors, vector_dim)
                    memory = self.calculate_memory(num_vectors, vector_dim, index_type)
                    
                    qps_per_node = 1000
                    num_query_nodes = max(2, int(qps_requirement / qps_per_node) + 1)
                    
                    qps_capacity = self.calculate_qps_capacity(num_query_nodes, cpu_per_node=4)
                    
                    plan = {
                        "数据规模": {
                            "向量数量": f"{num_vectors:,}",
                            "向量维度": vector_dim,
                            "索引类型": index_type
                        },
                        "存储需求": {
                            "原始数据": f"{storage['vector_size_gb']:.2f} GB",
                            "含索引": f"{storage['with_index_gb']:.2f} GB",
                            "推荐容量": f"{storage['required_gb']:.2f} GB"
                        },
                        "内存需求": {
                            "向量数据": f"{memory['vector_memory_gb']:.2f} GB",
                            "含索引": f"{memory['total_memory_gb']:.2f} GB",
                            "推荐容量": f"{memory['required_gb']:.2f} GB"
                        },
                        "计算资源": {
                            "Query Node数量": num_query_nodes,
                            "每节点CPU": "4核",
                            "每节点内存": f"{memory['required_gb'] / num_query_nodes:.0f} GB"
                        },
                        "QPS容量": {
                            "理论QPS": f"{qps_capacity['theoretical_qps']:.0f}",
                            "安全QPS": f"{qps_capacity['safe_qps']:.0f}",
                            "需求QPS": qps_requirement
                        }
                    }
                    
                    return plan
            
            # 使用容量规划器
            planner = CapacityPlanner()
            
            # 场景1: 1000万向量,768维,1000 QPS
            plan1 = planner.generate_plan(
                num_vectors=10000000,
                vector_dim=768,
                qps_requirement=1000,
                index_type="IVF_FLAT"
            )
            
            print("容量规划方案:")
            import json
            print(json.dumps(plan1, indent=2, ensure_ascii=False))
            
            # 场景2: 1亿向量,512维,5000 QPS
            plan2 = planner.generate_plan(
                num_vectors=100000000,
                vector_dim=512,
                qps_requirement=5000,
                index_type="HNSW"
            )
            
            print("\n大规模场景:")
            print(json.dumps(plan2, indent=2, ensure_ascii=False))
            
            # 容量规划建议
            planning_tips = [
                "预留30%-50%冗余空间",
                "考虑数据增长趋势",
                "评估峰值负载需求",
                "制定扩容策略",
                "定期审查和调整",
                "监控资源使用趋势",
                "建立容量告警机制"
            ]
            
            print("\n容量规划建议:")
            for i, tip in enumerate(planning_tips, 1):
                print(f"  {i}. {tip}")
            ---