Microsoft Fabric MCP Server
A comprehensive Python-based MCP (Model Context Protocol) server for interacting with Microsoft Fabric APIs, featuring advanced PySpark notebook development, testing, and optimization capabilities with LLM integration.
🚀 Features
Core Fabric Operations
- ✅ Workspace, lakehouse, warehouse, and table management
- ✅ Delta table schemas and metadata retrieval
- ✅ SQL query execution and data loading
- ✅ Report and semantic model operations
Advanced PySpark Development
- 📓 Intelligent notebook creation with 6 specialized templates
- 🔧 Smart code generation for common PySpark operations
- ✅ Comprehensive validation with syntax and best practices checking
- 🎯 Fabric-specific optimizations and compatibility checks
- 📊 Performance analysis with scoring and optimization recommendations
- 🚀 Real-time monitoring and execution insights
LLM Integration
- 🤖 Natural language interface for PySpark development
- 🧠 Context-aware assistance with conversation memory
- 🎨 Intelligent code formatting and explanations
- 📈 Smart optimization suggestions based on project patterns
🏗️ Architecture
graph TB
subgraph "Developer Environment"
IDE[IDE/VSCode]
DEV[Developer]
PROJ[Project Files]
end
subgraph "AI Layer"
LLM[Large Language Model<br/>Claude/GPT/etc.]
CONTEXT[Conversation Context]
REASONING[AI Reasoning Engine]
end
subgraph "MCP Layer"
MCP[MCP Server]
TOOLS[PySpark Tools]
HELPERS[PySpark Helpers]
TEMPLATES[Template Manager]
VALIDATORS[Code Validators]
GENERATORS[Code Generators]
end
subgraph "Microsoft Fabric"
API[Fabric API]
WS[Workspace]
LH[Lakehouse]
NB[Notebooks]
TABLES[Delta Tables]
SPARK[Spark Clusters]
end
subgraph "Operations Flow"
CREATE[Create Notebooks]
VALIDATE[Validate Code]
GENERATE[Generate Code]
ANALYZE[Analyze Performance]
DEPLOY[Deploy to Fabric]
end
%% Developer interactions
DEV --> IDE
IDE --> PROJ
%% LLM interactions
IDE <--> LLM
LLM <--> CONTEXT
LLM --> REASONING
%% MCP interactions
LLM <--> MCP
MCP --> TOOLS
TOOLS --> HELPERS
TOOLS --> TEMPLATES
TOOLS --> VALIDATORS
TOOLS --> GENERATORS
%% Fabric interactions
MCP <--> API
API --> WS
WS --> LH
WS --> NB
LH --> TABLES
NB --> SPARK
%% Operation flows
TOOLS --> CREATE
TOOLS --> VALIDATE
TOOLS --> GENERATE
TOOLS --> ANALYZE
CREATE --> DEPLOY
%% Data flow arrows
REASONING -.->|"Intelligent Decisions"| TOOLS
CONTEXT -.->|"Project Awareness"| VALIDATORS
%% Styling
classDef devEnv fill:#e1f5fe
classDef aiLayer fill:#fff9c4
classDef mcpLayer fill:#f3e5f5
classDef fabricLayer fill:#e8f5e8
classDef operations fill:#fff3e0
class IDE,DEV,PROJ devEnv
class LLM,CONTEXT,REASONING aiLayer
class MCP,TOOLS,HELPERS,TEMPLATES,VALIDATORS,GENERATORS mcpLayer
class API,WS,LH,NB,TABLES,SPARK fabricLayer
class CREATE,VALIDATE,GENERATE,ANALYZE,DEPLOY operations
Interaction Flow
- Developer requests assistance in IDE
- IDE communicates with LLM (Claude/GPT)
- LLM analyzes using context and reasoning
- LLM calls MCP server tools intelligently
- MCP tools interact with Fabric API
- Results flow back through LLM with intelligent formatting
- Developer receives contextual, smart responses
📋 Requirements
- Python 3.12+
- Azure credentials for authentication
- uv (from astral): Installation instructions
- Azure CLI: Installation instructions
- Optional: Node.js for MCP inspector: Installation instructions
🔧 Installation
-
Clone the repository:
git clone https://github.com/your-repo/fabric-mcp.git cd fabric-mcp -
Set up virtual environment:
uv sync -
Install dependencies:
pip install -r requirements.txt
🚀 Usage
- Using STDIO
Connect to Microsoft Fabric
az login --scope https://api.fabric.microsoft.com/.default
Running with MCP Inspector
uv run --with mcp mcp dev fabric_mcp.py
This starts the server with inspector at http://localhost:6274.
VSCode Integration
Add to your launch.json:
{
"mcp": {
"servers": {
"ms-fabric-mcp": {
"type": "stdio",
"command": "<FullPathToProjectFolder>\\\\.venv\\\\Scripts\\\\python.exe",
"args": ["<FullPathToProjectFolder>\\\\fabric_mcp.py"]
}
}
}
}
- Using HTTP
Start the MCP Server
uv run python .\\fabric_mcp.py --port 8081
VSCode Integration
Add to your launch.json:
{
"mcp": {
"servers": {
"ms-fabric-mcp": {
"type": "http",
"url": "http://<localhost or remote IP>:8081/mcp/",
"headers": {
"Accept": "application/json,text/event-stream",
}
}
}
}
}
🛠️ Complete Tool Reference
1. Workspace Management
list_workspaces
List all available Fabric workspaces.
# Usage in LLM: "List all my Fabric workspaces"
set_workspace
Set the current workspace context for the session.
set_workspace(workspace="Analytics-Workspace")
2. Lakehouse Operations
list_lakehouses
List all lakehouses in a workspace.
list_lakehouses(workspace="Analytics-Workspace")
create_lakehouse
Create a new lakehouse.
create_lakehouse(
name="Sales-Data-Lake",
workspace="Analytics-Workspace",
description="Sales data lakehouse"
)
set_lakehouse
Set current lakehouse context.
set_lakehouse(lakehouse="Sales-Data-Lake")
3. Warehouse Operations
list_warehouses
List all warehouses in a workspace.
list_warehouses(workspace="Analytics-Workspace")
create_warehouse
Create a new warehouse.
create_warehouse(
name="Sales-DW",
workspace="Analytics-Workspace",
description="Sales data warehouse"
)
set_warehouse
Set current warehouse context.
set_warehouse(warehouse="Sales-DW")
4. Table Operations
list_tables
List all tables in a lakehouse.
list_tables(workspace="Analytics-Workspace", lakehouse="Sales-Data-Lake")
get_lakehouse_table_schema
Get schema for a specific table.
get_lakehouse_table_schema(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
table_name="transactions"
)
get_all_lakehouse_schemas
Get schemas for all tables in a lakehouse.
get_all_lakehouse_schemas(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake"
)
set_table
Set current table context.
set_table(table_name="transactions")
5. SQL Operations
get_sql_endpoint
Get SQL endpoint for lakehouse or warehouse.
get_sql_endpoint(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
type="lakehouse"
)
run_query
Execute SQL queries.
run_query(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
query="SELECT COUNT(*) FROM transactions",
type="lakehouse"
)
6. Data Loading
load_data_from_url
Load data from URL into tables.
load_data_from_url(
url="https://example.com/data.csv",
destination_table="new_data",
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake"
)
7. Reports & Models
list_reports
List all reports in a workspace.
list_reports(workspace="Analytics-Workspace")
get_report
Get specific report details.
get_report(workspace="Analytics-Workspace", report_id="report-id")
list_semantic_models
List semantic models in workspace.
list_semantic_models(workspace="Analytics-Workspace")
get_semantic_model
Get specific semantic model.
get_semantic_model(workspace="Analytics-Workspace", model_id="model-id")
8. Basic Notebook Operations
list_notebooks
List all notebooks in a workspace.
list_notebooks(workspace="Analytics-Workspace")
get_notebook_content
Retrieve notebook content.
get_notebook_content(
workspace="Analytics-Workspace",
notebook_id="notebook-id"
)
update_notebook_cell
Update specific notebook cells.
update_notebook_cell(
workspace="Analytics-Workspace",
notebook_id="notebook-id",
cell_index=0,
cell_content="print('Hello, Fabric!')",
cell_type="code"
)
9. Advanced PySpark Notebook Creation
create_pyspark_notebook
Create notebooks from basic templates.
create_pyspark_notebook(
workspace="Analytics-Workspace",
notebook_name="Data-Analysis",
template_type="analytics" # Options: basic, etl, analytics, ml
)
create_fabric_notebook
Create Fabric-optimized notebooks.
create_fabric_notebook(
workspace="Analytics-Workspace",
notebook_name="Fabric-Pipeline",
template_type="fabric_integration" # Options: fabric_integration, streaming
)
10. PySpark Code Generation
generate_pyspark_code
Generate code for common operations.
generate_pyspark_code(
operation="read_table",
source_table="sales.transactions",
columns="id,amount,date"
)
# Available operations:
# - read_table, write_table, transform, join, aggregate
# - schema_inference, data_quality, performance_optimization
generate_fabric_code
Generate Fabric-specific code.
generate_fabric_code(
operation="read_lakehouse",
lakehouse_name="Sales-Data-Lake",
table_name="transactions"
)
# Available operations:
# - read_lakehouse, write_lakehouse, merge_delta, performance_monitor
11. Code Validation & Analysis
validate_pyspark_code
Validate PySpark code syntax and best practices.
validate_pyspark_code(code="""
df = spark.table('transactions')
df.show()
""")
validate_fabric_code
Validate Fabric compatibility.
validate_fabric_code(code="""
df = spark.table('lakehouse.transactions')
df.write.format('delta').saveAsTable('summary')
""")
analyze_notebook_performance
Comprehensive performance analysis.
analyze_notebook_performance(
workspace="Analytics-Workspace",
notebook_id="notebook-id"
)
12. Context Management
clear_context
Clear current session context.
clear_context()
📊 PySpark Templates
Basic Templates
- basic: Fundamental PySpark operations and DataFrame usage
- etl: Complete ETL pipeline with data cleaning and Delta Lake
- analytics: Advanced analytics with aggregations and window functions
- ml: Machine learning pipeline with MLlib and feature engineering
Advanced Templates
- fabric_integration: Lakehouse connectivity and Fabric-specific utilities
- streaming: Real-time processing with Structured Streaming
🎯 Best Practices
Fabric Optimization
# ✅ Use managed tables
df = spark.table("lakehouse.my_table")
# ✅ Use Delta Lake format
df.write.format("delta").mode("overwrite").saveAsTable("my_table")
# ✅ Leverage notebookutils
import notebookutils as nbu
workspace_id = nbu.runtime.context.workspaceId
Performance Optimization
# ✅ Cache frequently used DataFrames
df.cache()
# ✅ Use broadcast for small tables
from pyspark.sql.functions import broadcast
result = large_df.join(broadcast(small_df), "key")
# ✅ Partition large datasets
df.write.partitionBy("year", "month").saveAsTable("partitioned_table")
Code Quality
# ✅ Define explicit schemas
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True)
])
# ✅ Handle null values
df.filter(col("column").isNotNull())
🔄 Example LLM-Enhanced Workflows
Natural Language Requests
Human: "Create a PySpark notebook that reads sales data, cleans it, and optimizes performance"
LLM Response:
1. Creates Fabric-optimized notebook with ETL template
2. Generates lakehouse reading code
3. Adds data cleaning transformations
4. Includes performance optimization patterns
5. Validates code for best practices
Performance Analysis
Human: "My PySpark notebook is slow. Help me optimize it."
LLM Response:
1. Analyzes notebook performance (scoring 0-100)
2. Identifies anti-patterns and bottlenecks
3. Suggests specific optimizations
4. Generates optimized code alternatives
5. Provides before/after comparisons
🔍 Troubleshooting
Common Issues
- Authentication: Ensure
az loginwith correct scope - Context: Use
clear_context()to reset session state - Workspace: Verify workspace names and permissions
- Templates: Check available template types in documentation
Getting Help
- Use validation tools for code issues
- Check performance analysis for optimization opportunities
- Leverage LLM natural language interface for guidance
📈 Performance Metrics
The analysis tools provide:
- Operation counts per notebook cell
- Performance issues detection and flagging
- Optimization opportunities identification
- Scoring system (0-100) for code quality
- Fabric compatibility assessment
🤝 Contributing
This project welcomes contributions! Please see our contributing guidelines for details.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
🙏 Acknowledgments
Inspired by: https://github.com/Augustab/microsoft_fabric_mcp/tree/main
Ready to supercharge your Microsoft Fabric development with intelligent PySpark assistance! 🚀
Recommend MCP Servers 💡
@kapilduraphe/webflow-mcp-server
Enables Claude to interact with Webflow's APIs
lciesielski/mcp-salesforce-example
An MCP server that integrates with Salesforce to enable features like sending emails and deploying Apex code.
pubnub/pubnub-mcp-server
A CLI-based Model Context Protocol server that exposes PubNub SDK documentation and API resources to LLM-powered tools.
ciphertrust-mcp-server
MCP Server for Thales CipherTrust Manager
@hardik-id/azure-resource-graph-mcp-server
Model Context Protocol (MCP) server that provides access to Azure Resource Graph queries. It allows you to retrieve information about Azure resources across your subscriptions using Resource Graph queries.
alibaba-cloud-ops-mcp-server
AlibabaCloud CloudOps MCP Server