Databricks
STDIOMCP server for Databricks integration enabling LLM tools to interact with Databricks services
MCP server for Databricks integration enabling LLM tools to interact with Databricks services
A production-ready Model Context Protocol (MCP) server that exposes Databricks REST capabilities to MCP-compatible agents and tooling. Version 0.4.4 introduces structured responses, resource caching, retry-aware networking, and end-to-end resilience improvements.
CallToolResult with a human-readable summary in content and machine-readable payloads in structuredContent that conform to the tool’s outputSchema.resource_link content blocks with URIs such as resource://databricks/exports/{id} (also reflected in metadata for convenience).mcp.server.FastMCP with centralized JSON logging and concurrency guards for predictable stdio behaviour.databricks_mcp/server/databricks_mcp_server.py - FastMCP server with tool registration, progress handling, metrics, and resource caching.databricks_mcp/core/utils.py - HTTP utilities with correlation IDs, retries, and error mapping to DatabricksAPIError.databricks_mcp/core/logging_utils.py - JSON logging configuration for stderr/file outputs.databricks_mcp/core/models.py - Pydantic models (e.g., ClusterConfig) used by tool schemas.tests/ mock Databricks APIs to validate orchestration, structured responses, and schema metadata without shell scripts.For an in-depth tour of data flow and design decisions, see ARCHITECTURE.md.
uv for dependency management and publishingRegister the server with Cursor using the deeplink below - it resolves to uvx databricks-mcp-server@latest and picks up future updates automatically.
cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==
# Clone and enter the repository git clone https://github.com/markov-kernel/databricks-mcp.git cd databricks-mcp # Create an isolated environment (optional but recommended) uv venv source .venv/bin/activate # Linux/Mac # .\.venv\Scripts\activate # Windows PowerShell # Install package and development dependencies uv pip install -e . uv pip install -e ".[dev]"
Set the following environment variables (or populate .env from .env.example).
export DATABRICKS_HOST="https://your-workspace.databricks.com" export DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX" export DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" # optional default export TOOL_TIMEOUT_SECONDS=300 export MAX_CONCURRENT_REQUESTS=8 export HTTP_TIMEOUT_SECONDS=60 export API_MAX_RETRIES=3 export API_RETRY_BACKOFF_SECONDS=0.5
uvx databricks-mcp-server@latest
Tip: append
--refresh(e.g.,uvx databricks-mcp-server@latest --refresh) to forceuvto resolve the latest PyPI release after publishing. Logs are emitted as JSON lines to stderr and persisted todatabricks_mcp.login the working directory.
To adjust logging:
uvx databricks-mcp-server@latest -- --log-level DEBUG
Register the server and inject credentials via the CLI:
codex mcp add databricks --env DATABRICKS_HOST="https://your-workspace.databricks.com" --env DATABRICKS_TOKEN="dapi_XXXXXXXXXXXXXXXX" --env DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" -- uvx databricks-mcp-server@latest # Add --refresh immediately after a publish to invalidate the uv cache
Or edit ~/.codex/config.toml:
[mcp_servers.databricks] command = "uvx" args = ["databricks-mcp-server@latest"] env = { DATABRICKS_HOST = "https://your-workspace.databricks.com", DATABRICKS_TOKEN = "dapi_XXXXXXXXXXXXXXXX", DATABRICKS_WAREHOUSE_ID = "sql_warehouse_12345" } startup_timeout_sec = 15 tool_timeout_sec = 300
Planning an HTTP deployment? Codex also supports
url = "https://…"plusbearer_token_env_var = "DATABRICKS_TOKEN"orcodex mcp login(withexperimental_use_rmcp_client = true).
{ "mcpServers": { "databricks-mcp-local": { "command": "uvx", "args": ["databricks-mcp-server@latest"], "env": { "DATABRICKS_HOST": "https://your-workspace.databricks.com", "DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXX", "DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345", "RUNNING_VIA_CURSOR_MCP": "true" } } } }
Restart Cursor after saving and invoke tools as databricks-mcp-local:<tool>.
claude mcp add databricks-mcp-local -s user -e DATABRICKS_HOST="https://your-workspace.databricks.com" -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX" -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" -- uvx databricks-mcp-server@latest
structuredContent carries machine-readable payloads. Large artifacts are returned as resource_link content blocks using URIs like resource://databricks/exports/{id} and can be fetched via the MCP resources API.
result = await session.call_tool("list_clusters", {}) summary = next((block.text for block in result.content if getattr(block, "type", "") == "text"), "") clusters = (result.structuredContent or {}).get("clusters", []) resource_links = [block for block in result.content if isinstance(block, dict) and block.get("type") == "resource_link"]
Progress notifications follow MCP’s progress token mechanism; Codex surfaces these messages in the UI while a tool runs.
result = await session.call_tool("execute_sql", {"statement": "SELECT * FROM samples LIMIT 10"}) print(result.content[0].text) rows = (result.structuredContent or {}).get("result", [])
result = await session.call_tool("get_workspace_file_content", { "path": "/Users/[email protected]/report.ipynb", "format": "SOURCE" }) resource_link = next((block for block in result.content if isinstance(block, dict) and block.get("type") == "resource_link"), None) if resource_link: contents = await session.read_resource(resource_link["uri"])
| Category | Tool | Description |
|---|---|---|
| Clusters | list_clusters, create_cluster, terminate_cluster, get_cluster, start_cluster, resize_cluster, restart_cluster | Manage interactive clusters |
| Jobs | list_jobs, create_job, delete_job, run_job, run_notebook, sync_repo_and_run_notebook, get_run_status, list_job_runs, cancel_run | Manage scheduled and ad-hoc jobs |
| Workspace | list_notebooks, export_notebook, import_notebook, delete_workspace_object, get_workspace_file_content, get_workspace_file_info | Inspect and manage workspace assets |
| DBFS | list_files, dbfs_put, dbfs_delete | Explore DBFS and manage files |
| SQL | execute_sql | Submit SQL statements with optional warehouse_id, catalog, schema_name |
| Libraries | install_library, uninstall_library, list_cluster_libraries | Manage cluster libraries |
| Repos | create_repo, update_repo, list_repos, pull_repo | Manage Databricks repos |
| Unity Catalog | list_catalogs, create_catalog, list_schemas, create_schema, list_tables, create_table, get_table_lineage | Unity Catalog operations |
uv run black databricks_mcp tests uv run pylint databricks_mcp tests uv run pytest uv build uv publish --token "$PYPI_TOKEN"
uv run pytest
Pytest suites mock Databricks APIs, providing deterministic structured outputs and transcript tests.
Ensure PYPI_TOKEN is available (via .env or environment) before publishing:
uv build uv publish --token "$PYPI_TOKEN"
Released under the MIT License. See LICENSE.