DataPizza
STDIOMCP server providing intelligent access to datapizza-ai documentation through vector similarity search and RAG.
MCP server providing intelligent access to datapizza-ai documentation through vector similarity search and RAG.
A Model Context Protocol (MCP) server that provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.
This MCP server enables AI assistants and applications to query the comprehensive datapizza-ai documentation using natural language queries. It indexes documentation from the datapizza-ai repository and provides contextual, relevant responses through a RAG (Retrieval-Augmented Generation) pipeline.
The server consists of four main components:
query_datapizza toolgit clone https://github.com/datapizza-labs/mcp_server_datapizza.git cd datapizza-mcp-server
cd datapizza-mcp-server
pip install -e ".[dev]"
Create a .env file in the datapizza-mcp-server directory with the following variables:
# Required Configuration OPENAI_API_KEY=your_openai_api_key_here QDRANT_URL=your_qdrant_cloud_url QDRANT_API_KEY=your_qdrant_api_key # Optional Configuration EMBEDDING_MODEL=text-embedding-3-small EMBEDDING_DIMENSIONS=1536 COLLECTION_NAME=datapizza_docs MAX_RESULTS=5 CHUNK_SIZE=1024 CHUNK_OVERLAP=200 LOG_LEVEL=INFO
| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key for generating embeddings |
QDRANT_URL | Qdrant Cloud instance URL |
QDRANT_API_KEY | Qdrant Cloud API key |
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model |
EMBEDDING_DIMENSIONS | 1536 | Embedding vector dimensions |
COLLECTION_NAME | datapizza_docs | Qdrant collection name |
MAX_RESULTS | 5 | Maximum search results returned |
CHUNK_SIZE | 1024 | Document chunk size for indexing |
CHUNK_OVERLAP | 200 | Overlap between document chunks |
LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
Before using the server, index the datapizza-ai documentation:
python -m datapizza_mcp.indexer
To force re-indexing (clears existing data):
python -m datapizza_mcp.indexer --force
python -m datapizza_mcp.server
Or use the provided Windows batch script:
../run_datapizza.bat
The server exposes a query_datapizza tool that can be called by MCP clients:
# Example query result = await client.call_tool("query_datapizza", { "query": "come creare un agente con OpenAI", "max_results": 5 })
query_datapizza: Search datapizza-ai documentation
query (string): Natural language search querymax_results (int, optional): Maximum number of results (default: 5)datapizza://status: System status and configuration information# Format code black src/ # Lint code ruff check src/ ruff check src/ --fix # Auto-fix issues # Type checking mypy src/ # Run tests pytest
datapizza-mcp-server/
├── src/datapizza_mcp/
│ ├── __init__.py # Package exports
│ ├── config.py # Configuration management
│ ├── server.py # MCP server implementation
│ ├── indexer.py # Documentation indexing
│ └── retriever.py # RAG retrieval engine
├── pyproject.toml # Package configuration
├── .env # Environment variables
└── README.md # This file
Authentication Errors
OPENAI_API_KEY is set correctlyQDRANT_URL and QDRANT_API_KEY)Empty Search Results
python -m datapizza_mcp.indexerdatapizza://status resourceConnection Issues
Enable debug logging by setting LOG_LEVEL=DEBUG in your .env file.
This project is licensed under the MIT License. See the LICENSE file for details.
For issues and questions: