TiDB Python AI SDK
Quick Start • Documentation • Examples • Roadmap • Discord • Report Bug
Introduction
Python SDK for TiDB AI: A unified data platform empowering developers to build next-generation AI applications.
- 🔍 Unified Search Modes: Vector · Full‑Text · Hybrid
- 🎭 Auto‑Embedding & Multi‑Modal Storage: Support for text, images, and more
- 🖼️ Image Search Support: Text‑to‑image and image‑to‑image retrieval capabilities
- 🎯 Advanced Filtering & Reranking: Flexible filters with optional reranker models to fine-tune result relevance
- 💱 Transaction Support: Full transaction management including commit/rollback to ensure consistency
Installation
[!NOTE] This Python package is under rapid development and its API may change. It is recommended to use a fixed version when installing, e.g.,
pytidb==0.0.12.
pip install pytidb
# To use built-in embedding functions and rerankers:
pip install "pytidb[models]"
# To convert query results to pandas DataFrame:
pip install pandas
Connect to TiDB Cloud
Create a free TiDB cluster at tidbcloud.com.
import os
from pytidb import TiDBClient
tidb_client = TiDBClient.connect(
host=os.getenv("TIDB_HOST"),
port=int(os.getenv("TIDB_PORT")),
username=os.getenv("TIDB_USERNAME"),
password=os.getenv("TIDB_PASSWORD"),
database=os.getenv("TIDB_DATABASE"),
ensure_db=True,
)
Highlights
🤖 Automatic Embedding
PyTiDB automatically embeds text fields (e.g., text) and stores the vector embedding in a vector field (e.g., text_vec).
Create a table with an embedding function:
from pytidb.schema import TableModel, Field, FullTextField
from pytidb.embeddings import EmbeddingFunction
# Set API key for embedding provider.
tidb_client.configure_embedding_provider("openai", api_key=os.getenv("OPENAI_API_KEY"))
class Chunk(TableModel):
__tablename__ = "chunks"
id: int = Field(primary_key=True)
text: str = FullTextField()
text_vec: list[float] = EmbeddingFunction(
"openai/text-embedding-3-small"
).VectorField(source_field="text") # 👈 Defines the vector field.
user_id: int = Field()
table = tidb_client.create_table(schema=Chunk, if_exists="skip")
Bulk insert data:
table.bulk_insert([
Chunk(id=2, text="bar", user_id=2), # 👈 The text field is embedded and saved to text_vec automatically.
Chunk(id=3, text="baz", user_id=3),
Chunk(id=4, text="qux", user_id=4),
])
🔍 Search
Vector Search
Vector search finds the most relevant records based on semantic similarity, so you don't need to include all keywords explicitly in your query.
df = (
table.search("<query>") # 👈 The query is embedded automatically.
.filter({"user_id": 2})
.limit(2)
.to_list()
)
# Output: A list of dicts.
See the Vector Search example for more details.
Full-text Search
Full-text search tokenizes the query and finds the most relevant records by matching exact keywords.
df = (
table.search("<query>", search_type="fulltext")
.limit(2)
.to_pydantic()
)
# Output: A list of pydantic model instances.
See the Full-text Search example for more details.
Hybrid Search
Hybrid search combines exact matching from full-text search with semantic understanding from vector search, delivering more relevant and reliable results.
df = (
table.search("<query>", search_type="hybrid")
.limit(2)
.to_pandas()
)
# Output: A pandas DataFrame.
See the Hybrid Search example for more details.
Image Search
Image search lets you find visually similar images using natural language descriptions or another image as a reference.
from PIL import Image
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction
# Define a multi-modal embedding model.
jina_embed_fn = EmbeddingFunction("jina_ai/jina-embeddings-v4") # Using multi-modal embedding model.
class Pet(TableModel):
__tablename__ = "pets"
id: int = Field(primary_key=True)
image_uri: str = Field()
image_vec: list[float] = jina_embed_fn.VectorField(
source_field="image_uri",
source_type="image"
)
table = tidb_client.create_table(schema=Pet, if_exists="skip")
# Insert sample images ...
table.insert(Pet(image_uri="path/to/shiba_inu_14.jpg"))
# Search for images using natural language
results = table.search("shiba inu dog").limit(1).to_list()
# Search for images using an image ...
query_image = Image.open("shiba_inu_15.jpg")
results = table.search(query_image).limit(1).to_pydantic()
See the Image Search example for more details.
Advanced Filtering
PyTiDB supports a variety of operators for flexible filtering:
| Operator | Description | Example |
|---|---|---|
$eq |
Equal to | {"field": {"$eq": "hello"}} |
$gt |
Greater than | {"field": {"$gt": 1}} |
$gte |
Greater than or equal | {"field": {"$gte": 1}} |
$lt |
Less than | {"field": {"$lt": 1}} |
$lte |
Less than or equal | {"field": {"$lte": 1}} |
$in |
In array | {"field": {"$in": [1, 2, 3]}} |
$nin |
Not in array | {"field": {"$nin": [1, 2, 3]}} |
$and |
Logical AND | {"$and": [{"field1": 1}, {"field2": 2}]} |
$or |
Logical OR | {"$or": [{"field1": 1}, {"field2": 2}]} |
⛓ Join Structured and Unstructured Data
from pytidb import Session
from pytidb.sql import select
# Create a table to store user data:
class User(TableModel):
__tablename__ = "users"
id: int = Field(primary_key=True)
name: str = Field(max_length=20)
# Use the db_engine from TiDBClient when creating a Session
with Session(tidb_client.db_engine) as session:
query = (
select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
)
chunks = session.exec(query).all()
[(c.id, c.text, c.user_id) for c in chunks]
💱 Transaction Support
PyTiDB supports transaction management, helping you avoid race conditions and ensure data consistency.
with tidb_client.session() as session:
initial_total_balance = tidb_client.query("SELECT SUM(balance) FROM players").scalar()
# Transfer 10 coins from player 1 to player 2
tidb_client.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
tidb_client.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")
session.commit()
# or session.rollback()
final_total_balance = tidb_client.query("SELECT SUM(balance) FROM players").scalar()
assert final_total_balance == initial_total_balance
Extensions
[!TIP] Click the button below to install TiDB MCP Server in Cursor. Then, confirm by clicking Install when prompted.
Recommend MCP Servers 💡
@jpisnice/shadcn-ui-mcp-server
A mcp server to allow LLMS gain context about shadcn ui component structure,usage and installation,compaitable with react,svelte 5,vue & React Native
mcp-domain-availability
A Model Context Protocol (MCP) server that enables Claude Desktop to check domain availability across 50+ TLDs. Features DNS/WHOIS verification, bulk checking, and smart suggestions. Zero-clone installation via uvx.
gitingest-mcp
An MCP server that allows clients to quickly extract information about GitHub repositories, including summaries, project directory structure, and file content.
raj-mehra/figma-mcp
An MCP server for interacting with Figma design files to fetch tokens and components, compatible with Cursor IDE.
TAMA-MCP-Server
An AI-powered command-line interface (CLI) task manager that functions as an MCP server, enabling AI-driven task generation, expansion, and standard task management with dependency tracking.
Astro Docs
Connect your Meta Ads account to Claude and get instant insights on campaign performance, optimization recommendations, and strategic advice. No technical setup required.