Speedofmastery commited on
Commit
88f3fce
·
verified ·
1 Parent(s): 02974e8

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Dockerfile ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ wget \
8
+ gnupg \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ COPY requirements.txt .
12
+ RUN pip install --no-cache-dir -r requirements.txt
13
+
14
+ # Install Playwright browsers
15
+ RUN playwright install --with-deps chromium
16
+
17
+ COPY . .
18
+
19
+ EXPOSE 7860
20
+
21
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,64 @@
1
- ---
2
- title: Orynxml Agents
3
- emoji: 🦀
4
- colorFrom: gray
5
- colorTo: yellow
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: ORYNXML Complete Backend with Agents
3
+ emoji: 🤖
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+
10
+ # ORYNXML Complete Backend with AI Agents
11
+
12
+ FastAPI backend with integrated AI agents for ORYNXML AI Platform.
13
+
14
+ ## AI Agents
15
+
16
+ ### 1. Manus Agent (Main)
17
+ - Chat and conversation
18
+ - Task execution
19
+ - Tool orchestration
20
+ - General AI capabilities
21
+
22
+ ### 2. Software Engineer Agent (SWE)
23
+ - Code generation (Python, JavaScript, etc.)
24
+ - Code debugging and refactoring
25
+ - Architecture design
26
+ - Test generation
27
+
28
+ ### 3. Browser Agent
29
+ - Web scraping
30
+ - Browser automation
31
+ - Form filling
32
+ - Navigation and interaction
33
+
34
+ ### 4. Data Analysis Agent
35
+ - Data visualization
36
+ - Chart generation
37
+ - Statistical analysis
38
+ - Data transformation
39
+
40
+ ## API Endpoints
41
+
42
+ ### Authentication
43
+ - `POST /auth/signup` - Register user
44
+ - `POST /auth/login` - Login user
45
+
46
+ ### Agent Operations
47
+ - `POST /agent/run` - Run any agent with prompt
48
+ - `POST /agent/code` - Generate code (SWE agent)
49
+ - `POST /agent/browser` - Browser automation
50
+ - `POST /agent/data` - Data analysis
51
+ - `GET /agents/list` - List all agents
52
+
53
+ ### Status
54
+ - `GET /health` - Health check
55
+ - `GET /cloudflare/status` - Cloudflare status
56
+
57
+ ## Frontend
58
+ https://orynxml-ai.pages.dev
59
+
60
+ ## Architecture
61
+ - FastAPI REST API
62
+ - 4 specialized AI agents
63
+ - Cloudflare integration
64
+ - SQLite authentication
app.py ADDED
@@ -0,0 +1,419 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ORYNXML Complete Backend with AI Agents
3
+ FastAPI REST API + Manus Agent + SWE Agent + Browser Agent + HuggingFace Agent
4
+ """
5
+
6
+ from fastapi import FastAPI, HTTPException, BackgroundTasks
7
+ from fastapi.middleware.cors import CORSMiddleware
8
+ from pydantic import BaseModel
9
+ from typing import Optional, List, Dict, Any
10
+ import os
11
+ import sys
12
+ import sqlite3
13
+ import hashlib
14
+ import asyncio
15
+ from datetime import datetime
16
+
17
+ # Add parent directory to path for imports
18
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
19
+
20
+ # Import AI Agents
21
+ from app.agent.manus import Manus
22
+ from app.agent.swe import SWEAgent
23
+ from app.agent.browser import BrowserAgent
24
+ from app.agent.data_analysis import DataAnalysis
25
+ from app.llm import get_llm
26
+ from app.tool.tool_collection import ToolCollection
27
+
28
+ # HuggingFace token
29
+ HF_TOKEN = os.getenv("HF_TOKEN", "")
30
+
31
+ # Cloudflare Configuration
32
+ CLOUDFLARE_CONFIG = {
33
+ "api_token": os.getenv("CLOUDFLARE_API_TOKEN", ""),
34
+ "account_id": os.getenv("CLOUDFLARE_ACCOUNT_ID", "62af59a7ac82b29543577ee6800735ee"),
35
+ "d1_database_id": os.getenv("CLOUDFLARE_D1_DATABASE_ID", "6d887f74-98ac-4db7-bfed-8061903d1f6c"),
36
+ "r2_bucket_name": os.getenv("CLOUDFLARE_R2_BUCKET_NAME", "openmanus-storage"),
37
+ "kv_namespace_id": os.getenv("CLOUDFLARE_KV_NAMESPACE_ID", "87f4aa01410d4fb19821f61006f94441"),
38
+ "kv_namespace_cache": os.getenv("CLOUDFLARE_KV_CACHE_ID", "7b58c88292c847d1a82c8e0dd5129f37"),
39
+ }
40
+
41
+ # Global agents (initialized on startup)
42
+ manus_agent = None
43
+ swe_agent = None
44
+ browser_agent = None
45
+ data_agent = None
46
+
47
+ # Initialize FastAPI
48
+ app = FastAPI(
49
+ title="ORYNXML AI Platform with Agents",
50
+ description="Complete AI backend with Manus, SWE, Browser, and Data Analysis agents",
51
+ version="2.0.0",
52
+ )
53
+
54
+ # CORS
55
+ app.add_middleware(
56
+ CORSMiddleware,
57
+ allow_origins=["*"],
58
+ allow_credentials=True,
59
+ allow_methods=["*"],
60
+ allow_headers=["*"],
61
+ )
62
+
63
+ # Database
64
+ def init_database():
65
+ conn = sqlite3.connect("openmanus.db")
66
+ cursor = conn.cursor()
67
+ cursor.execute("""
68
+ CREATE TABLE IF NOT EXISTS users (
69
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
70
+ mobile TEXT UNIQUE NOT NULL,
71
+ name TEXT NOT NULL,
72
+ password_hash TEXT NOT NULL,
73
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
74
+ )
75
+ """)
76
+ conn.commit()
77
+ conn.close()
78
+
79
+ init_database()
80
+
81
+ # Pydantic Models
82
+ class SignupRequest(BaseModel):
83
+ mobile: str
84
+ name: str
85
+ password: str
86
+
87
+ class LoginRequest(BaseModel):
88
+ mobile: str
89
+ password: str
90
+
91
+ class AgentRequest(BaseModel):
92
+ prompt: str
93
+ agent: Optional[str] = "manus" # manus, swe, browser, data
94
+
95
+ class CodeRequest(BaseModel):
96
+ task: str
97
+ language: Optional[str] = "python"
98
+
99
+ class BrowserRequest(BaseModel):
100
+ task: str
101
+ url: Optional[str] = None
102
+
103
+ class DataRequest(BaseModel):
104
+ data: Any
105
+ task: str
106
+
107
+ # Helper Functions
108
+ def hash_password(password: str) -> str:
109
+ return hashlib.sha256(password.encode()).hexdigest()
110
+
111
+ def verify_password(password: str, password_hash: str) -> bool:
112
+ return hash_password(password) == password_hash
113
+
114
+ # Startup event - Initialize agents
115
+ @app.on_event("startup")
116
+ async def startup_event():
117
+ global manus_agent, swe_agent, browser_agent, data_agent
118
+
119
+ print("🚀 Initializing AI Agents...")
120
+
121
+ try:
122
+ # Initialize Manus (main agent)
123
+ manus_agent = await Manus.create()
124
+ print("✅ Manus Agent initialized")
125
+
126
+ # Initialize SWE Agent
127
+ swe_agent = await SWEAgent.create()
128
+ print("✅ SWE Agent initialized")
129
+
130
+ # Initialize Browser Agent
131
+ browser_agent = await BrowserAgent.create()
132
+ print("✅ Browser Agent initialized")
133
+
134
+ # Initialize Data Analysis Agent
135
+ data_agent = await DataAnalysis.create()
136
+ print("✅ Data Analysis Agent initialized")
137
+
138
+ print("🎉 All agents ready!")
139
+
140
+ except Exception as e:
141
+ print(f"⚠️ Warning: Could not initialize all agents: {e}")
142
+ print("API will still work with limited functionality")
143
+
144
+ # API Endpoints
145
+
146
+ @app.get("/")
147
+ async def root():
148
+ return {
149
+ "message": "ORYNXML AI Platform with Agents",
150
+ "version": "2.0.0",
151
+ "agents": {
152
+ "manus": "Main agent with all capabilities" if manus_agent else "Not initialized",
153
+ "swe": "Software Engineer agent" if swe_agent else "Not initialized",
154
+ "browser": "Browser automation agent" if browser_agent else "Not initialized",
155
+ "data": "Data analysis agent" if data_agent else "Not initialized",
156
+ },
157
+ "endpoints": {
158
+ "health": "/health",
159
+ "auth": "/auth/signup, /auth/login",
160
+ "agents": "/agent/run, /agent/code, /agent/browser, /agent/data",
161
+ }
162
+ }
163
+
164
+ @app.get("/health")
165
+ async def health_check():
166
+ return {
167
+ "status": "healthy",
168
+ "timestamp": datetime.now().isoformat(),
169
+ "agents_initialized": {
170
+ "manus": manus_agent is not None,
171
+ "swe": swe_agent is not None,
172
+ "browser": browser_agent is not None,
173
+ "data": data_agent is not None,
174
+ },
175
+ "cloudflare_configured": bool(CLOUDFLARE_CONFIG["api_token"]),
176
+ }
177
+
178
+ @app.post("/auth/signup")
179
+ async def signup(request: SignupRequest):
180
+ try:
181
+ if len(request.password) < 6:
182
+ raise HTTPException(status_code=400, detail="Password must be at least 6 characters")
183
+
184
+ conn = sqlite3.connect("openmanus.db")
185
+ cursor = conn.cursor()
186
+
187
+ cursor.execute("SELECT mobile FROM users WHERE mobile = ?", (request.mobile,))
188
+ if cursor.fetchone():
189
+ conn.close()
190
+ raise HTTPException(status_code=400, detail="Mobile number already registered")
191
+
192
+ password_hash = hash_password(request.password)
193
+ cursor.execute(
194
+ "INSERT INTO users (mobile, name, password_hash) VALUES (?, ?, ?)",
195
+ (request.mobile, request.name, password_hash)
196
+ )
197
+ conn.commit()
198
+ conn.close()
199
+
200
+ return {
201
+ "success": True,
202
+ "message": "Account created successfully",
203
+ "mobile": request.mobile,
204
+ "name": request.name
205
+ }
206
+
207
+ except HTTPException:
208
+ raise
209
+ except Exception as e:
210
+ raise HTTPException(status_code=500, detail=f"Registration failed: {str(e)}")
211
+
212
+ @app.post("/auth/login")
213
+ async def login(request: LoginRequest):
214
+ try:
215
+ conn = sqlite3.connect("openmanus.db")
216
+ cursor = conn.cursor()
217
+
218
+ cursor.execute(
219
+ "SELECT name, password_hash FROM users WHERE mobile = ?",
220
+ (request.mobile,)
221
+ )
222
+ result = cursor.fetchone()
223
+ conn.close()
224
+
225
+ if not result:
226
+ raise HTTPException(status_code=401, detail="Invalid mobile number or password")
227
+
228
+ name, password_hash = result
229
+
230
+ if not verify_password(request.password, password_hash):
231
+ raise HTTPException(status_code=401, detail="Invalid mobile number or password")
232
+
233
+ return {
234
+ "success": True,
235
+ "message": "Login successful",
236
+ "user": {
237
+ "mobile": request.mobile,
238
+ "name": name
239
+ },
240
+ "token": f"session_{hash_password(request.mobile + str(datetime.now()))[:32]}"
241
+ }
242
+
243
+ except HTTPException:
244
+ raise
245
+ except Exception as e:
246
+ raise HTTPException(status_code=500, detail=f"Login failed: {str(e)}")
247
+
248
+ @app.post("/agent/run")
249
+ async def run_agent(request: AgentRequest):
250
+ """Run any agent with a prompt"""
251
+ try:
252
+ agent_name = request.agent.lower()
253
+
254
+ # Select agent
255
+ if agent_name == "manus":
256
+ if not manus_agent:
257
+ raise HTTPException(status_code=503, detail="Manus agent not initialized")
258
+ agent = manus_agent
259
+ elif agent_name == "swe":
260
+ if not swe_agent:
261
+ raise HTTPException(status_code=503, detail="SWE agent not initialized")
262
+ agent = swe_agent
263
+ elif agent_name == "browser":
264
+ if not browser_agent:
265
+ raise HTTPException(status_code=503, detail="Browser agent not initialized")
266
+ agent = browser_agent
267
+ elif agent_name == "data":
268
+ if not data_agent:
269
+ raise HTTPException(status_code=503, detail="Data agent not initialized")
270
+ agent = data_agent
271
+ else:
272
+ raise HTTPException(status_code=400, detail=f"Unknown agent: {agent_name}")
273
+
274
+ # Run agent
275
+ result = await agent.run(request.prompt)
276
+
277
+ return {
278
+ "success": True,
279
+ "agent": agent_name,
280
+ "prompt": request.prompt,
281
+ "result": str(result),
282
+ "timestamp": datetime.now().isoformat()
283
+ }
284
+
285
+ except HTTPException:
286
+ raise
287
+ except Exception as e:
288
+ raise HTTPException(status_code=500, detail=f"Agent execution failed: {str(e)}")
289
+
290
+ @app.post("/agent/code")
291
+ async def generate_code(request: CodeRequest):
292
+ """Software Engineer Agent - Generate code"""
293
+ try:
294
+ if not swe_agent:
295
+ raise HTTPException(status_code=503, detail="SWE agent not initialized")
296
+
297
+ prompt = f"Generate {request.language} code for: {request.task}"
298
+ result = await swe_agent.run(prompt)
299
+
300
+ return {
301
+ "success": True,
302
+ "task": request.task,
303
+ "language": request.language,
304
+ "code": str(result),
305
+ "timestamp": datetime.now().isoformat()
306
+ }
307
+
308
+ except HTTPException:
309
+ raise
310
+ except Exception as e:
311
+ raise HTTPException(status_code=500, detail=f"Code generation failed: {str(e)}")
312
+
313
+ @app.post("/agent/browser")
314
+ async def browser_automation(request: BrowserRequest):
315
+ """Browser Agent - Automate web tasks"""
316
+ try:
317
+ if not browser_agent:
318
+ raise HTTPException(status_code=503, detail="Browser agent not initialized")
319
+
320
+ prompt = f"{request.task}"
321
+ if request.url:
322
+ prompt += f" on {request.url}"
323
+
324
+ result = await browser_agent.run(prompt)
325
+
326
+ return {
327
+ "success": True,
328
+ "task": request.task,
329
+ "url": request.url,
330
+ "result": str(result),
331
+ "timestamp": datetime.now().isoformat()
332
+ }
333
+
334
+ except HTTPException:
335
+ raise
336
+ except Exception as e:
337
+ raise HTTPException(status_code=500, detail=f"Browser automation failed: {str(e)}")
338
+
339
+ @app.post("/agent/data")
340
+ async def analyze_data(request: DataRequest):
341
+ """Data Analysis Agent - Analyze and visualize data"""
342
+ try:
343
+ if not data_agent:
344
+ raise HTTPException(status_code=503, detail="Data agent not initialized")
345
+
346
+ prompt = f"Analyze this data: {request.data}. Task: {request.task}"
347
+ result = await data_agent.run(prompt)
348
+
349
+ return {
350
+ "success": True,
351
+ "task": request.task,
352
+ "result": str(result),
353
+ "timestamp": datetime.now().isoformat()
354
+ }
355
+
356
+ except HTTPException:
357
+ raise
358
+ except Exception as e:
359
+ raise HTTPException(status_code=500, detail=f"Data analysis failed: {str(e)}")
360
+
361
+ @app.get("/agents/list")
362
+ async def list_agents():
363
+ """List all available agents and their status"""
364
+ return {
365
+ "agents": [
366
+ {
367
+ "name": "manus",
368
+ "description": "Main agent with all capabilities (chat, coding, browsing, data analysis)",
369
+ "status": "initialized" if manus_agent else "not initialized",
370
+ "endpoint": "/agent/run"
371
+ },
372
+ {
373
+ "name": "swe",
374
+ "description": "Software Engineer agent (code generation, debugging, refactoring)",
375
+ "status": "initialized" if swe_agent else "not initialized",
376
+ "endpoint": "/agent/code"
377
+ },
378
+ {
379
+ "name": "browser",
380
+ "description": "Browser automation agent (web scraping, form filling, navigation)",
381
+ "status": "initialized" if browser_agent else "not initialized",
382
+ "endpoint": "/agent/browser"
383
+ },
384
+ {
385
+ "name": "data",
386
+ "description": "Data analysis agent (charts, visualization, statistics)",
387
+ "status": "initialized" if data_agent else "not initialized",
388
+ "endpoint": "/agent/data"
389
+ }
390
+ ]
391
+ }
392
+
393
+ @app.get("/cloudflare/status")
394
+ async def cloudflare_status():
395
+ services = []
396
+ if CLOUDFLARE_CONFIG["api_token"]:
397
+ services.append("✅ API Token Configured")
398
+ if CLOUDFLARE_CONFIG["d1_database_id"]:
399
+ services.append("✅ D1 Database Connected")
400
+ if CLOUDFLARE_CONFIG["r2_bucket_name"]:
401
+ services.append("✅ R2 Storage Connected")
402
+ if CLOUDFLARE_CONFIG["kv_namespace_id"]:
403
+ services.append("✅ KV Sessions Connected")
404
+ if CLOUDFLARE_CONFIG["kv_namespace_cache"]:
405
+ services.append("✅ KV Cache Connected")
406
+
407
+ return {
408
+ "configured": len(services) > 0,
409
+ "services": services,
410
+ "account_id": CLOUDFLARE_CONFIG["account_id"]
411
+ }
412
+
413
+ if __name__ == "__main__":
414
+ uvicorn.run(
415
+ app,
416
+ host="0.0.0.0",
417
+ port=7860,
418
+ log_level="info"
419
+ )
app/__init__.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python version check: 3.11-3.13
2
+ import sys
3
+
4
+
5
+ if sys.version_info < (3, 11) or sys.version_info > (3, 13):
6
+ print(
7
+ "Warning: Unsupported Python version {ver}, please use 3.11-3.13".format(
8
+ ver=".".join(map(str, sys.version_info))
9
+ )
10
+ )
app/agent/__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from app.agent.base import BaseAgent
2
+ from app.agent.browser import BrowserAgent
3
+ from app.agent.mcp import MCPAgent
4
+ from app.agent.react import ReActAgent
5
+ from app.agent.swe import SWEAgent
6
+ from app.agent.toolcall import ToolCallAgent
7
+
8
+
9
+ __all__ = [
10
+ "BaseAgent",
11
+ "BrowserAgent",
12
+ "ReActAgent",
13
+ "SWEAgent",
14
+ "ToolCallAgent",
15
+ "MCPAgent",
16
+ ]
app/agent/base.py ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import ABC, abstractmethod
2
+ from contextlib import asynccontextmanager
3
+ from typing import List, Optional
4
+
5
+ from pydantic import BaseModel, Field, model_validator
6
+
7
+ from app.llm import LLM
8
+ from app.logger import logger
9
+ from app.sandbox.client import SANDBOX_CLIENT
10
+ from app.schema import ROLE_TYPE, AgentState, Memory, Message
11
+
12
+
13
+ class BaseAgent(BaseModel, ABC):
14
+ """Abstract base class for managing agent state and execution.
15
+
16
+ Provides foundational functionality for state transitions, memory management,
17
+ and a step-based execution loop. Subclasses must implement the `step` method.
18
+ """
19
+
20
+ # Core attributes
21
+ name: str = Field(..., description="Unique name of the agent")
22
+ description: Optional[str] = Field(None, description="Optional agent description")
23
+
24
+ # Prompts
25
+ system_prompt: Optional[str] = Field(
26
+ None, description="System-level instruction prompt"
27
+ )
28
+ next_step_prompt: Optional[str] = Field(
29
+ None, description="Prompt for determining next action"
30
+ )
31
+
32
+ # Dependencies
33
+ llm: LLM = Field(default_factory=LLM, description="Language model instance")
34
+ memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
35
+ state: AgentState = Field(
36
+ default=AgentState.IDLE, description="Current agent state"
37
+ )
38
+
39
+ # Execution control
40
+ max_steps: int = Field(default=10, description="Maximum steps before termination")
41
+ current_step: int = Field(default=0, description="Current step in execution")
42
+
43
+ duplicate_threshold: int = 2
44
+
45
+ class Config:
46
+ arbitrary_types_allowed = True
47
+ extra = "allow" # Allow extra fields for flexibility in subclasses
48
+
49
+ @model_validator(mode="after")
50
+ def initialize_agent(self) -> "BaseAgent":
51
+ """Initialize agent with default settings if not provided."""
52
+ if self.llm is None or not isinstance(self.llm, LLM):
53
+ self.llm = LLM(config_name=self.name.lower())
54
+ if not isinstance(self.memory, Memory):
55
+ self.memory = Memory()
56
+ return self
57
+
58
+ @asynccontextmanager
59
+ async def state_context(self, new_state: AgentState):
60
+ """Context manager for safe agent state transitions.
61
+
62
+ Args:
63
+ new_state: The state to transition to during the context.
64
+
65
+ Yields:
66
+ None: Allows execution within the new state.
67
+
68
+ Raises:
69
+ ValueError: If the new_state is invalid.
70
+ """
71
+ if not isinstance(new_state, AgentState):
72
+ raise ValueError(f"Invalid state: {new_state}")
73
+
74
+ previous_state = self.state
75
+ self.state = new_state
76
+ try:
77
+ yield
78
+ except Exception as e:
79
+ self.state = AgentState.ERROR # Transition to ERROR on failure
80
+ raise e
81
+ finally:
82
+ self.state = previous_state # Revert to previous state
83
+
84
+ def update_memory(
85
+ self,
86
+ role: ROLE_TYPE, # type: ignore
87
+ content: str,
88
+ base64_image: Optional[str] = None,
89
+ **kwargs,
90
+ ) -> None:
91
+ """Add a message to the agent's memory.
92
+
93
+ Args:
94
+ role: The role of the message sender (user, system, assistant, tool).
95
+ content: The message content.
96
+ base64_image: Optional base64 encoded image.
97
+ **kwargs: Additional arguments (e.g., tool_call_id for tool messages).
98
+
99
+ Raises:
100
+ ValueError: If the role is unsupported.
101
+ """
102
+ message_map = {
103
+ "user": Message.user_message,
104
+ "system": Message.system_message,
105
+ "assistant": Message.assistant_message,
106
+ "tool": lambda content, **kw: Message.tool_message(content, **kw),
107
+ }
108
+
109
+ if role not in message_map:
110
+ raise ValueError(f"Unsupported message role: {role}")
111
+
112
+ # Create message with appropriate parameters based on role
113
+ kwargs = {"base64_image": base64_image, **(kwargs if role == "tool" else {})}
114
+ self.memory.add_message(message_map[role](content, **kwargs))
115
+
116
+ async def run(self, request: Optional[str] = None) -> str:
117
+ """Execute the agent's main loop asynchronously.
118
+
119
+ Args:
120
+ request: Optional initial user request to process.
121
+
122
+ Returns:
123
+ A string summarizing the execution results.
124
+
125
+ Raises:
126
+ RuntimeError: If the agent is not in IDLE state at start.
127
+ """
128
+ if self.state != AgentState.IDLE:
129
+ raise RuntimeError(f"Cannot run agent from state: {self.state}")
130
+
131
+ if request:
132
+ self.update_memory("user", request)
133
+
134
+ results: List[str] = []
135
+ async with self.state_context(AgentState.RUNNING):
136
+ while (
137
+ self.current_step < self.max_steps and self.state != AgentState.FINISHED
138
+ ):
139
+ self.current_step += 1
140
+ logger.info(f"Executing step {self.current_step}/{self.max_steps}")
141
+ step_result = await self.step()
142
+
143
+ # Check for stuck state
144
+ if self.is_stuck():
145
+ self.handle_stuck_state()
146
+
147
+ results.append(f"Step {self.current_step}: {step_result}")
148
+
149
+ if self.current_step >= self.max_steps:
150
+ self.current_step = 0
151
+ self.state = AgentState.IDLE
152
+ results.append(f"Terminated: Reached max steps ({self.max_steps})")
153
+ await SANDBOX_CLIENT.cleanup()
154
+ return "\n".join(results) if results else "No steps executed"
155
+
156
+ @abstractmethod
157
+ async def step(self) -> str:
158
+ """Execute a single step in the agent's workflow.
159
+
160
+ Must be implemented by subclasses to define specific behavior.
161
+ """
162
+
163
+ def handle_stuck_state(self):
164
+ """Handle stuck state by adding a prompt to change strategy"""
165
+ stuck_prompt = "\
166
+ Observed duplicate responses. Consider new strategies and avoid repeating ineffective paths already attempted."
167
+ self.next_step_prompt = f"{stuck_prompt}\n{self.next_step_prompt}"
168
+ logger.warning(f"Agent detected stuck state. Added prompt: {stuck_prompt}")
169
+
170
+ def is_stuck(self) -> bool:
171
+ """Check if the agent is stuck in a loop by detecting duplicate content"""
172
+ if len(self.memory.messages) < 2:
173
+ return False
174
+
175
+ last_message = self.memory.messages[-1]
176
+ if not last_message.content:
177
+ return False
178
+
179
+ # Count identical content occurrences
180
+ duplicate_count = sum(
181
+ 1
182
+ for msg in reversed(self.memory.messages[:-1])
183
+ if msg.role == "assistant" and msg.content == last_message.content
184
+ )
185
+
186
+ return duplicate_count >= self.duplicate_threshold
187
+
188
+ @property
189
+ def messages(self) -> List[Message]:
190
+ """Retrieve a list of messages from the agent's memory."""
191
+ return self.memory.messages
192
+
193
+ @messages.setter
194
+ def messages(self, value: List[Message]):
195
+ """Set the list of messages in the agent's memory."""
196
+ self.memory.messages = value
app/agent/browser.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import TYPE_CHECKING, Optional
3
+
4
+ from pydantic import Field, model_validator
5
+
6
+ from app.agent.toolcall import ToolCallAgent
7
+ from app.logger import logger
8
+ from app.prompt.browser import NEXT_STEP_PROMPT, SYSTEM_PROMPT
9
+ from app.schema import Message, ToolChoice
10
+ from app.tool import BrowserUseTool, Terminate, ToolCollection
11
+ from app.tool.sandbox.sb_browser_tool import SandboxBrowserTool
12
+
13
+
14
+ # Avoid circular import if BrowserAgent needs BrowserContextHelper
15
+ if TYPE_CHECKING:
16
+ from app.agent.base import BaseAgent # Or wherever memory is defined
17
+
18
+
19
+ class BrowserContextHelper:
20
+ def __init__(self, agent: "BaseAgent"):
21
+ self.agent = agent
22
+ self._current_base64_image: Optional[str] = None
23
+
24
+ async def get_browser_state(self) -> Optional[dict]:
25
+ browser_tool = self.agent.available_tools.get_tool(BrowserUseTool().name)
26
+ if not browser_tool:
27
+ browser_tool = self.agent.available_tools.get_tool(
28
+ SandboxBrowserTool().name
29
+ )
30
+ if not browser_tool or not hasattr(browser_tool, "get_current_state"):
31
+ logger.warning("BrowserUseTool not found or doesn't have get_current_state")
32
+ return None
33
+ try:
34
+ result = await browser_tool.get_current_state()
35
+ if result.error:
36
+ logger.debug(f"Browser state error: {result.error}")
37
+ return None
38
+ if hasattr(result, "base64_image") and result.base64_image:
39
+ self._current_base64_image = result.base64_image
40
+ else:
41
+ self._current_base64_image = None
42
+ return json.loads(result.output)
43
+ except Exception as e:
44
+ logger.debug(f"Failed to get browser state: {str(e)}")
45
+ return None
46
+
47
+ async def format_next_step_prompt(self) -> str:
48
+ """Gets browser state and formats the browser prompt."""
49
+ browser_state = await self.get_browser_state()
50
+ url_info, tabs_info, content_above_info, content_below_info = "", "", "", ""
51
+ results_info = "" # Or get from agent if needed elsewhere
52
+
53
+ if browser_state and not browser_state.get("error"):
54
+ url_info = f"\n URL: {browser_state.get('url', 'N/A')}\n Title: {browser_state.get('title', 'N/A')}"
55
+ tabs = browser_state.get("tabs", [])
56
+ if tabs:
57
+ tabs_info = f"\n {len(tabs)} tab(s) available"
58
+ pixels_above = browser_state.get("pixels_above", 0)
59
+ pixels_below = browser_state.get("pixels_below", 0)
60
+ if pixels_above > 0:
61
+ content_above_info = f" ({pixels_above} pixels)"
62
+ if pixels_below > 0:
63
+ content_below_info = f" ({pixels_below} pixels)"
64
+
65
+ if self._current_base64_image:
66
+ image_message = Message.user_message(
67
+ content="Current browser screenshot:",
68
+ base64_image=self._current_base64_image,
69
+ )
70
+ self.agent.memory.add_message(image_message)
71
+ self._current_base64_image = None # Consume the image after adding
72
+
73
+ return NEXT_STEP_PROMPT.format(
74
+ url_placeholder=url_info,
75
+ tabs_placeholder=tabs_info,
76
+ content_above_placeholder=content_above_info,
77
+ content_below_placeholder=content_below_info,
78
+ results_placeholder=results_info,
79
+ )
80
+
81
+ async def cleanup_browser(self):
82
+ browser_tool = self.agent.available_tools.get_tool(BrowserUseTool().name)
83
+ if browser_tool and hasattr(browser_tool, "cleanup"):
84
+ await browser_tool.cleanup()
85
+
86
+
87
+ class BrowserAgent(ToolCallAgent):
88
+ """
89
+ A browser agent that uses the browser_use library to control a browser.
90
+
91
+ This agent can navigate web pages, interact with elements, fill forms,
92
+ extract content, and perform other browser-based actions to accomplish tasks.
93
+ """
94
+
95
+ name: str = "browser"
96
+ description: str = "A browser agent that can control a browser to accomplish tasks"
97
+
98
+ system_prompt: str = SYSTEM_PROMPT
99
+ next_step_prompt: str = NEXT_STEP_PROMPT
100
+
101
+ max_observe: int = 10000
102
+ max_steps: int = 20
103
+
104
+ # Configure the available tools
105
+ available_tools: ToolCollection = Field(
106
+ default_factory=lambda: ToolCollection(BrowserUseTool(), Terminate())
107
+ )
108
+
109
+ # Use Auto for tool choice to allow both tool usage and free-form responses
110
+ tool_choices: ToolChoice = ToolChoice.AUTO
111
+ special_tool_names: list[str] = Field(default_factory=lambda: [Terminate().name])
112
+
113
+ browser_context_helper: Optional[BrowserContextHelper] = None
114
+
115
+ @model_validator(mode="after")
116
+ def initialize_helper(self) -> "BrowserAgent":
117
+ self.browser_context_helper = BrowserContextHelper(self)
118
+ return self
119
+
120
+ async def think(self) -> bool:
121
+ """Process current state and decide next actions using tools, with browser state info added"""
122
+ self.next_step_prompt = (
123
+ await self.browser_context_helper.format_next_step_prompt()
124
+ )
125
+ return await super().think()
126
+
127
+ async def cleanup(self):
128
+ """Clean up browser agent resources by calling parent cleanup."""
129
+ await self.browser_context_helper.cleanup_browser()
app/agent/data_analysis.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import Field
2
+
3
+ from app.agent.toolcall import ToolCallAgent
4
+ from app.config import config
5
+ from app.prompt.visualization import NEXT_STEP_PROMPT, SYSTEM_PROMPT
6
+ from app.tool import Terminate, ToolCollection
7
+ from app.tool.chart_visualization.chart_prepare import VisualizationPrepare
8
+ from app.tool.chart_visualization.data_visualization import DataVisualization
9
+ from app.tool.chart_visualization.python_execute import NormalPythonExecute
10
+
11
+
12
+ class DataAnalysis(ToolCallAgent):
13
+ """
14
+ A data analysis agent that uses planning to solve various data analysis tasks.
15
+
16
+ This agent extends ToolCallAgent with a comprehensive set of tools and capabilities,
17
+ including Data Analysis, Chart Visualization, Data Report.
18
+ """
19
+
20
+ name: str = "Data_Analysis"
21
+ description: str = "An analytical agent that utilizes python and data visualization tools to solve diverse data analysis tasks"
22
+
23
+ system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)
24
+ next_step_prompt: str = NEXT_STEP_PROMPT
25
+
26
+ max_observe: int = 15000
27
+ max_steps: int = 20
28
+
29
+ # Add general-purpose tools to the tool collection
30
+ available_tools: ToolCollection = Field(
31
+ default_factory=lambda: ToolCollection(
32
+ NormalPythonExecute(),
33
+ VisualizationPrepare(),
34
+ DataVisualization(),
35
+ Terminate(),
36
+ )
37
+ )
app/agent/huggingface_agent.py ADDED
@@ -0,0 +1,889 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face Agent Integration for OpenManus
3
+ Extends the main AI agent with access to thousands of HuggingFace models
4
+ """
5
+
6
+ import os
7
+ from typing import Any, Dict, List, Optional
8
+
9
+ from app.agent.base import BaseAgent
10
+ from app.huggingface_models import ModelCategory
11
+ from app.logger import logger
12
+ from app.tool.huggingface_models_tool import HuggingFaceModelsTool
13
+
14
+
15
+ class HuggingFaceAgent(BaseAgent):
16
+ """AI Agent with integrated HuggingFace model access"""
17
+
18
+ def __init__(self, **config):
19
+ super().__init__(**config)
20
+
21
+ # Initialize HuggingFace integration
22
+ hf_token = os.getenv("HUGGINGFACE_TOKEN") or config.get("huggingface_token")
23
+ if not hf_token:
24
+ logger.warning(
25
+ "No Hugging Face token provided. HF models will not be available."
26
+ )
27
+ self.hf_tool = None
28
+ else:
29
+ self.hf_tool = HuggingFaceModelsTool(hf_token)
30
+
31
+ # Default models for different tasks
32
+ self.default_models = {
33
+ "text_generation": "MiniMax-M2", # Latest high-performance model
34
+ "image_generation": "FLUX.1 Dev", # Best quality image generation
35
+ "speech_recognition": "Whisper Large v3", # Best multilingual ASR
36
+ "text_to_speech": "Kokoro 82M", # High quality, lightweight TTS
37
+ "image_classification": "ViT Base Patch16", # General image classification
38
+ "embeddings": "Sentence Transformers All MiniLM", # Fast embeddings
39
+ "translation": "M2M100 1.2B", # Multilingual translation
40
+ "summarization": "PEGASUS XSum", # Abstractive summarization
41
+ }
42
+
43
+ async def generate_text_with_hf(
44
+ self,
45
+ prompt: str,
46
+ model_name: Optional[str] = None,
47
+ max_tokens: int = 200,
48
+ temperature: float = 0.7,
49
+ stream: bool = False,
50
+ ) -> Dict[str, Any]:
51
+ """Generate text using HuggingFace models"""
52
+ if not self.hf_tool:
53
+ return {"error": "HuggingFace integration not available"}
54
+
55
+ model_name = model_name or self.default_models["text_generation"]
56
+
57
+ return await self.hf_tool.text_generation(
58
+ model_name=model_name,
59
+ prompt=prompt,
60
+ max_tokens=max_tokens,
61
+ temperature=temperature,
62
+ stream=stream,
63
+ )
64
+
65
+ async def generate_image_with_hf(
66
+ self,
67
+ prompt: str,
68
+ model_name: Optional[str] = None,
69
+ negative_prompt: Optional[str] = None,
70
+ width: int = 1024,
71
+ height: int = 1024,
72
+ ) -> Dict[str, Any]:
73
+ """Generate images using HuggingFace models"""
74
+ if not self.hf_tool:
75
+ return {"error": "HuggingFace integration not available"}
76
+
77
+ model_name = model_name or self.default_models["image_generation"]
78
+
79
+ return await self.hf_tool.generate_image(
80
+ model_name=model_name,
81
+ prompt=prompt,
82
+ negative_prompt=negative_prompt,
83
+ width=width,
84
+ height=height,
85
+ )
86
+
87
+ async def transcribe_audio_with_hf(
88
+ self,
89
+ audio_data: bytes,
90
+ model_name: Optional[str] = None,
91
+ language: Optional[str] = None,
92
+ ) -> Dict[str, Any]:
93
+ """Transcribe audio using HuggingFace models"""
94
+ if not self.hf_tool:
95
+ return {"error": "HuggingFace integration not available"}
96
+
97
+ model_name = model_name or self.default_models["speech_recognition"]
98
+
99
+ return await self.hf_tool.transcribe_audio(
100
+ model_name=model_name, audio_data=audio_data, language=language
101
+ )
102
+
103
+ async def synthesize_speech_with_hf(
104
+ self,
105
+ text: str,
106
+ model_name: Optional[str] = None,
107
+ voice_id: Optional[str] = None,
108
+ ) -> Dict[str, Any]:
109
+ """Generate speech from text using HuggingFace models"""
110
+ if not self.hf_tool:
111
+ return {"error": "HuggingFace integration not available"}
112
+
113
+ model_name = model_name or self.default_models["text_to_speech"]
114
+
115
+ return await self.hf_tool.text_to_speech(
116
+ model_name=model_name, text=text, voice_id=voice_id
117
+ )
118
+
119
+ async def classify_image_with_hf(
120
+ self, image_data: bytes, model_name: Optional[str] = None, task: str = "general"
121
+ ) -> Dict[str, Any]:
122
+ """Classify images using HuggingFace models"""
123
+ if not self.hf_tool:
124
+ return {"error": "HuggingFace integration not available"}
125
+
126
+ # Choose model based on task
127
+ if task == "nsfw":
128
+ model_name = "NSFW Image Detection"
129
+ elif task == "emotions":
130
+ model_name = "Facial Emotions Detection"
131
+ elif task == "deepfake":
132
+ model_name = "Deepfake Detection"
133
+ else:
134
+ model_name = model_name or self.default_models["image_classification"]
135
+
136
+ return await self.hf_tool.classify_image(
137
+ model_name=model_name, image_data=image_data
138
+ )
139
+
140
+ async def get_text_embeddings_with_hf(
141
+ self, texts: List[str], model_name: Optional[str] = None
142
+ ) -> Dict[str, Any]:
143
+ """Get text embeddings using HuggingFace models"""
144
+ if not self.hf_tool:
145
+ return {"error": "HuggingFace integration not available"}
146
+
147
+ model_name = model_name or self.default_models["embeddings"]
148
+
149
+ return await self.hf_tool.get_embeddings(model_name=model_name, texts=texts)
150
+
151
+ async def translate_with_hf(
152
+ self,
153
+ text: str,
154
+ target_language: str,
155
+ source_language: Optional[str] = None,
156
+ model_name: Optional[str] = None,
157
+ ) -> Dict[str, Any]:
158
+ """Translate text using HuggingFace models"""
159
+ if not self.hf_tool:
160
+ return {"error": "HuggingFace integration not available"}
161
+
162
+ model_name = model_name or self.default_models["translation"]
163
+
164
+ return await self.hf_tool.translate_text(
165
+ model_name=model_name,
166
+ text=text,
167
+ source_language=source_language,
168
+ target_language=target_language,
169
+ )
170
+
171
+ async def summarize_with_hf(
172
+ self, text: str, model_name: Optional[str] = None, max_length: int = 150
173
+ ) -> Dict[str, Any]:
174
+ """Summarize text using HuggingFace models"""
175
+ if not self.hf_tool:
176
+ return {"error": "HuggingFace integration not available"}
177
+
178
+ model_name = model_name or self.default_models["summarization"]
179
+
180
+ return await self.hf_tool.summarize_text(
181
+ model_name=model_name, text=text, max_length=max_length
182
+ )
183
+
184
+ def get_available_hf_models(self, category: Optional[str] = None) -> Dict[str, Any]:
185
+ """Get list of available HuggingFace models"""
186
+ if not self.hf_tool:
187
+ return {"error": "HuggingFace integration not available"}
188
+
189
+ return self.hf_tool.list_available_models(category)
190
+
191
+ async def smart_model_selection(
192
+ self, task_description: str, content_type: str = "text"
193
+ ) -> str:
194
+ """
195
+ Intelligently select the best HuggingFace model for a task
196
+
197
+ Args:
198
+ task_description: Description of what the user wants to do
199
+ content_type: Type of content (text, image, audio, video)
200
+ """
201
+ task_lower = task_description.lower()
202
+
203
+ # Video generation and processing
204
+ if any(
205
+ keyword in task_lower
206
+ for keyword in [
207
+ "video",
208
+ "movie",
209
+ "animation",
210
+ "motion",
211
+ "gif",
212
+ "sequence",
213
+ "frames",
214
+ ]
215
+ ):
216
+ if "generate" in task_lower or "create" in task_lower:
217
+ return "Stable Video Diffusion"
218
+ elif "analyze" in task_lower or "describe" in task_lower:
219
+ return "Video ChatGPT"
220
+ else:
221
+ return "AnimateDiff"
222
+
223
+ # Code and App Development
224
+ elif any(
225
+ keyword in task_lower
226
+ for keyword in [
227
+ "code",
228
+ "programming",
229
+ "app",
230
+ "application",
231
+ "software",
232
+ "develop",
233
+ "build",
234
+ "function",
235
+ "class",
236
+ "api",
237
+ "database",
238
+ "website",
239
+ "frontend",
240
+ "backend",
241
+ ]
242
+ ):
243
+ if "app" in task_lower or "application" in task_lower:
244
+ return "CodeLlama 34B Instruct" # Best for full applications
245
+ elif "python" in task_lower:
246
+ return "WizardCoder 34B" # Python specialist
247
+ elif "api" in task_lower:
248
+ return "StarCoder2 15B" # Good for APIs
249
+ elif "explain" in task_lower or "comment" in task_lower:
250
+ return "Phind CodeLlama" # Best for code explanation
251
+ else:
252
+ return "DeepSeek Coder V2" # General coding
253
+
254
+ # 3D and AR/VR Content
255
+ elif any(
256
+ keyword in task_lower
257
+ for keyword in [
258
+ "3d",
259
+ "three dimensional",
260
+ "mesh",
261
+ "model",
262
+ "obj",
263
+ "stl",
264
+ "ar",
265
+ "vr",
266
+ "augmented reality",
267
+ "virtual reality",
268
+ "texture",
269
+ "material",
270
+ ]
271
+ ):
272
+ if "text" in task_lower and ("3d" in task_lower or "model" in task_lower):
273
+ return "Shap-E"
274
+ elif "image" in task_lower and "3d" in task_lower:
275
+ return "DreamFusion"
276
+ else:
277
+ return "Point-E"
278
+
279
+ # Document Processing and OCR
280
+ elif any(
281
+ keyword in task_lower
282
+ for keyword in [
283
+ "ocr",
284
+ "document",
285
+ "pdf",
286
+ "scan",
287
+ "extract text",
288
+ "handwriting",
289
+ "form",
290
+ "table",
291
+ "layout",
292
+ "invoice",
293
+ "receipt",
294
+ "contract",
295
+ ]
296
+ ):
297
+ if "handwriting" in task_lower or "handwritten" in task_lower:
298
+ return "TrOCR Handwritten"
299
+ elif "table" in task_lower:
300
+ return "TableTransformer"
301
+ elif "form" in task_lower:
302
+ return "FormNet"
303
+ else:
304
+ return "TrOCR Large"
305
+
306
+ # Multimodal AI
307
+ elif any(
308
+ keyword in task_lower
309
+ for keyword in [
310
+ "visual question",
311
+ "image question",
312
+ "describe image",
313
+ "multimodal",
314
+ "vision language",
315
+ "image text",
316
+ "cross modal",
317
+ ]
318
+ ):
319
+ if "chat" in task_lower or "conversation" in task_lower:
320
+ return "GPT-4V"
321
+ elif "question" in task_lower:
322
+ return "LLaVA"
323
+ else:
324
+ return "BLIP-2"
325
+
326
+ # Creative Content
327
+ elif any(
328
+ keyword in task_lower
329
+ for keyword in [
330
+ "story",
331
+ "creative",
332
+ "poem",
333
+ "poetry",
334
+ "novel",
335
+ "screenplay",
336
+ "script",
337
+ "blog",
338
+ "article",
339
+ "marketing",
340
+ "copy",
341
+ "advertising",
342
+ ]
343
+ ):
344
+ if "story" in task_lower or "novel" in task_lower:
345
+ return "Novel AI"
346
+ elif "poem" in task_lower or "poetry" in task_lower:
347
+ return "Poet Assistant"
348
+ elif "marketing" in task_lower or "copy" in task_lower:
349
+ return "Marketing Copy AI"
350
+ else:
351
+ return "GPT-3.5 Creative"
352
+
353
+ # Game Development
354
+ elif any(
355
+ keyword in task_lower
356
+ for keyword in [
357
+ "game",
358
+ "character",
359
+ "npc",
360
+ "level",
361
+ "dialogue",
362
+ "asset",
363
+ "quest",
364
+ "gameplay",
365
+ "mechanic",
366
+ "unity",
367
+ "unreal",
368
+ ]
369
+ ):
370
+ if "character" in task_lower:
371
+ return "Character AI"
372
+ elif "level" in task_lower or "environment" in task_lower:
373
+ return "Level Designer"
374
+ elif "dialogue" in task_lower or "conversation" in task_lower:
375
+ return "Dialogue Writer"
376
+ else:
377
+ return "Asset Creator"
378
+
379
+ # Science and Research
380
+ elif any(
381
+ keyword in task_lower
382
+ for keyword in [
383
+ "research",
384
+ "scientific",
385
+ "paper",
386
+ "analysis",
387
+ "data",
388
+ "protein",
389
+ "molecule",
390
+ "chemistry",
391
+ "biology",
392
+ "physics",
393
+ "experiment",
394
+ ]
395
+ ):
396
+ if "protein" in task_lower or "folding" in task_lower:
397
+ return "AlphaFold"
398
+ elif "molecule" in task_lower or "chemistry" in task_lower:
399
+ return "ChemBERTa"
400
+ elif "data" in task_lower and "analysis" in task_lower:
401
+ return "Data Analyst"
402
+ else:
403
+ return "SciBERT"
404
+
405
+ # Business and Productivity
406
+ elif any(
407
+ keyword in task_lower
408
+ for keyword in [
409
+ "email",
410
+ "business",
411
+ "report",
412
+ "presentation",
413
+ "meeting",
414
+ "project",
415
+ "plan",
416
+ "proposal",
417
+ "memo",
418
+ "letter",
419
+ "professional",
420
+ ]
421
+ ):
422
+ if "email" in task_lower:
423
+ return "Email Assistant"
424
+ elif "presentation" in task_lower:
425
+ return "Presentation AI"
426
+ elif "report" in task_lower:
427
+ return "Report Writer"
428
+ elif "meeting" in task_lower:
429
+ return "Meeting Summarizer"
430
+ else:
431
+ return "Project Planner"
432
+
433
+ # Specialized AI
434
+ elif any(
435
+ keyword in task_lower
436
+ for keyword in [
437
+ "music",
438
+ "audio",
439
+ "sound",
440
+ "voice clone",
441
+ "enhance",
442
+ "restore",
443
+ "upscale",
444
+ "remove background",
445
+ "inpaint",
446
+ "style transfer",
447
+ ]
448
+ ):
449
+ if "music" in task_lower:
450
+ return "MusicGen"
451
+ elif "voice" in task_lower and "clone" in task_lower:
452
+ return "Voice Cloner"
453
+ elif "upscale" in task_lower or "enhance" in task_lower:
454
+ return "Real-ESRGAN"
455
+ elif "background" in task_lower and "remove" in task_lower:
456
+ return "Background Remover"
457
+ elif "restore" in task_lower or "face" in task_lower:
458
+ return "GFPGAN"
459
+ else:
460
+ return "LaMa"
461
+
462
+ # Traditional categories
463
+ elif any(
464
+ keyword in task_lower
465
+ for keyword in [
466
+ "generate",
467
+ "write",
468
+ "create",
469
+ "compose",
470
+ "chat",
471
+ "conversation",
472
+ ]
473
+ ):
474
+ if "chat" in task_lower or "conversation" in task_lower:
475
+ return "Llama 3.1 8B Instruct"
476
+ else:
477
+ return "MiniMax-M2"
478
+
479
+ # Image generation
480
+ elif any(
481
+ keyword in task_lower
482
+ for keyword in ["image", "picture", "draw", "art", "photo", "visual"]
483
+ ):
484
+ if "fast" in task_lower or "quick" in task_lower:
485
+ return "FLUX.1 Schnell"
486
+ else:
487
+ return "FLUX.1 Dev"
488
+
489
+ # Audio processing
490
+ elif any(
491
+ keyword in task_lower
492
+ for keyword in ["transcribe", "speech to text", "recognize", "audio"]
493
+ ):
494
+ if content_type == "audio" or "transcribe" in task_lower:
495
+ return "Whisper Large v3"
496
+
497
+ # Text-to-speech
498
+ elif any(
499
+ keyword in task_lower
500
+ for keyword in ["speak", "voice", "text to speech", "tts"]
501
+ ):
502
+ if "fast" in task_lower:
503
+ return "Kokoro 82M" # Lightweight and fast
504
+ else:
505
+ return "VibeVoice 1.5B" # High quality
506
+
507
+ # Image analysis
508
+ elif (
509
+ any(
510
+ keyword in task_lower
511
+ for keyword in ["classify", "analyze image", "detect", "recognize"]
512
+ )
513
+ and content_type == "image"
514
+ ):
515
+ if "nsfw" in task_lower or "safe" in task_lower:
516
+ return "NSFW Image Detection"
517
+ elif "emotion" in task_lower or "face" in task_lower:
518
+ return "Facial Emotions Detection"
519
+ elif "deepfake" in task_lower or "fake" in task_lower:
520
+ return "Deepfake Detection"
521
+ else:
522
+ return "ViT Base Patch16" # General classification
523
+
524
+ # Translation
525
+ elif any(
526
+ keyword in task_lower for keyword in ["translate", "language", "convert"]
527
+ ):
528
+ return "M2M100 1.2B" # Multilingual translation
529
+
530
+ # Summarization
531
+ elif any(
532
+ keyword in task_lower
533
+ for keyword in ["summarize", "summary", "abstract", "brief"]
534
+ ):
535
+ return "PEGASUS XSum" # Best summarization
536
+
537
+ # Embeddings/similarity
538
+ elif any(
539
+ keyword in task_lower
540
+ for keyword in ["similar", "embed", "vector", "search", "match"]
541
+ ):
542
+ return "Sentence Transformers All MiniLM" # Fast embeddings
543
+
544
+ # Default fallback
545
+ else:
546
+ return "MiniMax-M2" # Best general-purpose model
547
+
548
+ async def execute_hf_task(
549
+ self, task: str, content: Any, model_name: Optional[str] = None, **kwargs
550
+ ) -> Dict[str, Any]:
551
+ """
552
+ Execute any HuggingFace task with intelligent model selection
553
+
554
+ Args:
555
+ task: Task description (e.g., "generate image", "transcribe audio")
556
+ content: Input content (text, image bytes, audio bytes)
557
+ model_name: Specific model to use (optional)
558
+ **kwargs: Additional parameters
559
+ """
560
+ if not self.hf_tool:
561
+ return {"error": "HuggingFace integration not available"}
562
+
563
+ try:
564
+ task_lower = task.lower()
565
+
566
+ # Determine content type
567
+ content_type = "text"
568
+ if isinstance(content, bytes):
569
+ if (
570
+ b"PNG" in content[:20]
571
+ or b"JFIF" in content[:20]
572
+ or b"GIF" in content[:20]
573
+ ):
574
+ content_type = "image"
575
+ else:
576
+ content_type = "audio"
577
+
578
+ # Auto-select model if not specified
579
+ if not model_name:
580
+ model_name = await self.smart_model_selection(task, content_type)
581
+
582
+ # Route to appropriate method based on task
583
+ if "generate" in task_lower and (
584
+ "image" in task_lower or "picture" in task_lower
585
+ ):
586
+ return await self.generate_image_with_hf(content, model_name, **kwargs)
587
+
588
+ elif "transcribe" in task_lower or "speech to text" in task_lower:
589
+ return await self.transcribe_audio_with_hf(
590
+ content, model_name, **kwargs
591
+ )
592
+
593
+ elif "text to speech" in task_lower or "tts" in task_lower:
594
+ return await self.synthesize_speech_with_hf(
595
+ content, model_name, **kwargs
596
+ )
597
+
598
+ elif "classify" in task_lower and content_type == "image":
599
+ return await self.classify_image_with_hf(content, model_name, **kwargs)
600
+
601
+ elif "embed" in task_lower or "vector" in task_lower:
602
+ texts = [content] if isinstance(content, str) else content
603
+ return await self.get_text_embeddings_with_hf(texts, model_name)
604
+
605
+ elif "translate" in task_lower:
606
+ return await self.translate_with_hf(
607
+ content, model_name=model_name, **kwargs
608
+ )
609
+
610
+ elif "summarize" in task_lower:
611
+ return await self.summarize_with_hf(content, model_name, **kwargs)
612
+
613
+ else:
614
+ # Default to text generation
615
+ return await self.generate_text_with_hf(content, model_name, **kwargs)
616
+
617
+ except Exception as e:
618
+ logger.error(f"HuggingFace task execution failed: {e}")
619
+ return {"error": f"Task execution failed: {str(e)}"}
620
+
621
+ async def chat_with_hf_models(
622
+ self, message: str, conversation_history: List[Dict] = None
623
+ ) -> Dict[str, Any]:
624
+ """
625
+ Enhanced chat with access to HuggingFace models
626
+
627
+ This method extends the base agent's capabilities with HF models
628
+ """
629
+ # Check if the user is asking for HuggingFace-specific functionality
630
+ message_lower = message.lower()
631
+
632
+ # Handle model listing requests
633
+ if "list" in message_lower and (
634
+ "model" in message_lower or "hf" in message_lower
635
+ ):
636
+ return self.get_available_hf_models()
637
+
638
+ # Handle specific model requests
639
+ hf_keywords = [
640
+ "generate image",
641
+ "create image",
642
+ "draw",
643
+ "picture",
644
+ "transcribe",
645
+ "speech to text",
646
+ "audio",
647
+ "text to speech",
648
+ "speak",
649
+ "voice",
650
+ "translate",
651
+ "language",
652
+ "classify image",
653
+ "embed",
654
+ "vector",
655
+ "similarity",
656
+ "summarize",
657
+ ]
658
+
659
+ if any(keyword in message_lower for keyword in hf_keywords):
660
+ # This is likely a HuggingFace model request
661
+ return await self.execute_hf_task(message, message)
662
+
663
+ # For regular chat, we can enhance responses with HF models
664
+ # First get a response from the base agent
665
+ base_response = await super().chat(message, conversation_history)
666
+
667
+ # Optionally enhance with HF capabilities if relevant
668
+ if "image" in message_lower and "generate" in message_lower:
669
+ # User might want image generation
670
+ base_response["hf_suggestion"] = {
671
+ "action": "generate_image",
672
+ "models": ["FLUX.1 Dev", "FLUX.1 Schnell", "Stable Diffusion XL"],
673
+ "message": "I can also generate images for you using HuggingFace models. Just ask!",
674
+ }
675
+
676
+ return base_response
677
+
678
+ # New methods for expanded model categories
679
+
680
+ async def generate_video_with_hf(
681
+ self, prompt: str, model_name: Optional[str] = None, **kwargs
682
+ ) -> Dict[str, Any]:
683
+ """Generate video from text prompt"""
684
+ if not self.hf_tool:
685
+ return {"error": "HuggingFace integration not available"}
686
+
687
+ model_name = model_name or "Stable Video Diffusion"
688
+ return await self.hf_tool.text_to_video(
689
+ model_name=model_name, prompt=prompt, **kwargs
690
+ )
691
+
692
+ async def generate_code_with_hf(
693
+ self,
694
+ prompt: str,
695
+ language: str = "python",
696
+ model_name: Optional[str] = None,
697
+ **kwargs,
698
+ ) -> Dict[str, Any]:
699
+ """Generate code from natural language description"""
700
+ if not self.hf_tool:
701
+ return {"error": "HuggingFace integration not available"}
702
+
703
+ model_name = model_name or "CodeLlama 34B Instruct"
704
+ return await self.hf_tool.code_generation(
705
+ model_name=model_name, prompt=prompt, language=language, **kwargs
706
+ )
707
+
708
+ async def generate_app_with_hf(
709
+ self,
710
+ description: str,
711
+ app_type: str = "web_app",
712
+ model_name: Optional[str] = None,
713
+ **kwargs,
714
+ ) -> Dict[str, Any]:
715
+ """Generate complete application from description"""
716
+ if not self.hf_tool:
717
+ return {"error": "HuggingFace integration not available"}
718
+
719
+ model_name = model_name or "CodeLlama 34B Instruct"
720
+ enhanced_prompt = f"Create a {app_type} application: {description}"
721
+ return await self.hf_tool.code_generation(
722
+ model_name=model_name, prompt=enhanced_prompt, **kwargs
723
+ )
724
+
725
+ async def generate_3d_model_with_hf(
726
+ self, prompt: str, model_name: Optional[str] = None, **kwargs
727
+ ) -> Dict[str, Any]:
728
+ """Generate 3D model from text description"""
729
+ if not self.hf_tool:
730
+ return {"error": "HuggingFace integration not available"}
731
+
732
+ model_name = model_name or "Shap-E"
733
+ return await self.hf_tool.text_to_3d(
734
+ model_name=model_name, prompt=prompt, **kwargs
735
+ )
736
+
737
+ async def process_document_with_hf(
738
+ self,
739
+ document_data: bytes,
740
+ task_type: str = "ocr",
741
+ model_name: Optional[str] = None,
742
+ **kwargs,
743
+ ) -> Dict[str, Any]:
744
+ """Process documents with OCR and analysis"""
745
+ if not self.hf_tool:
746
+ return {"error": "HuggingFace integration not available"}
747
+
748
+ if task_type == "ocr":
749
+ model_name = model_name or "TrOCR Large"
750
+ return await self.hf_tool.ocr(
751
+ model_name=model_name, image_data=document_data, **kwargs
752
+ )
753
+ else:
754
+ model_name = model_name or "LayoutLMv3"
755
+ return await self.hf_tool.document_analysis(
756
+ model_name=model_name, document_data=document_data, **kwargs
757
+ )
758
+
759
+ async def multimodal_chat_with_hf(
760
+ self, image_data: bytes, text: str, model_name: Optional[str] = None, **kwargs
761
+ ) -> Dict[str, Any]:
762
+ """Chat with images using multimodal models"""
763
+ if not self.hf_tool:
764
+ return {"error": "HuggingFace integration not available"}
765
+
766
+ model_name = model_name or "BLIP-2"
767
+ return await self.hf_tool.vision_language(
768
+ model_name=model_name, image_data=image_data, text=text, **kwargs
769
+ )
770
+
771
+ async def generate_music_with_hf(
772
+ self,
773
+ prompt: str,
774
+ duration: int = 30,
775
+ model_name: Optional[str] = None,
776
+ **kwargs,
777
+ ) -> Dict[str, Any]:
778
+ """Generate music from text description"""
779
+ if not self.hf_tool:
780
+ return {"error": "HuggingFace integration not available"}
781
+
782
+ model_name = model_name or "MusicGen"
783
+ return await self.hf_tool.music_generation(
784
+ model_name=model_name, prompt=prompt, duration=duration, **kwargs
785
+ )
786
+
787
+ async def enhance_image_with_hf(
788
+ self,
789
+ image_data: bytes,
790
+ task_type: str = "super_resolution",
791
+ model_name: Optional[str] = None,
792
+ **kwargs,
793
+ ) -> Dict[str, Any]:
794
+ """Enhance images with various AI models"""
795
+ if not self.hf_tool:
796
+ return {"error": "HuggingFace integration not available"}
797
+
798
+ if task_type == "super_resolution":
799
+ model_name = model_name or "Real-ESRGAN"
800
+ return await self.hf_tool.super_resolution(
801
+ model_name=model_name, image_data=image_data, **kwargs
802
+ )
803
+ elif task_type == "background_removal":
804
+ model_name = model_name or "Background Remover"
805
+ return await self.hf_tool.background_removal(
806
+ model_name=model_name, image_data=image_data, **kwargs
807
+ )
808
+ elif task_type == "face_restoration":
809
+ model_name = model_name or "GFPGAN"
810
+ return await self.hf_tool.super_resolution(
811
+ model_name=model_name, image_data=image_data, **kwargs
812
+ )
813
+
814
+ async def generate_creative_content_with_hf(
815
+ self,
816
+ prompt: str,
817
+ content_type: str = "story",
818
+ model_name: Optional[str] = None,
819
+ **kwargs,
820
+ ) -> Dict[str, Any]:
821
+ """Generate creative content like stories, poems, etc."""
822
+ if not self.hf_tool:
823
+ return {"error": "HuggingFace integration not available"}
824
+
825
+ model_name = model_name or "GPT-3.5 Creative"
826
+ enhanced_prompt = f"Write a {content_type}: {prompt}"
827
+ return await self.hf_tool.creative_writing(
828
+ model_name=model_name, prompt=enhanced_prompt, **kwargs
829
+ )
830
+
831
+ async def generate_game_content_with_hf(
832
+ self,
833
+ description: str,
834
+ content_type: str = "character",
835
+ model_name: Optional[str] = None,
836
+ **kwargs,
837
+ ) -> Dict[str, Any]:
838
+ """Generate game development content"""
839
+ if not self.hf_tool:
840
+ return {"error": "HuggingFace integration not available"}
841
+
842
+ model_name = model_name or "Character AI"
843
+ enhanced_prompt = f"Create game {content_type}: {description}"
844
+ return await self.hf_tool.creative_writing(
845
+ model_name=model_name, prompt=enhanced_prompt, **kwargs
846
+ )
847
+
848
+ async def generate_business_document_with_hf(
849
+ self,
850
+ context: str,
851
+ document_type: str = "email",
852
+ model_name: Optional[str] = None,
853
+ **kwargs,
854
+ ) -> Dict[str, Any]:
855
+ """Generate business documents and content"""
856
+ if not self.hf_tool:
857
+ return {"error": "HuggingFace integration not available"}
858
+
859
+ model_name = model_name or "Email Assistant"
860
+ return await self.hf_tool.business_document(
861
+ model_name=model_name,
862
+ document_type=document_type,
863
+ context=context,
864
+ **kwargs,
865
+ )
866
+
867
+ async def research_assistance_with_hf(
868
+ self,
869
+ topic: str,
870
+ research_type: str = "analysis",
871
+ model_name: Optional[str] = None,
872
+ **kwargs,
873
+ ) -> Dict[str, Any]:
874
+ """Research assistance and scientific content generation"""
875
+ if not self.hf_tool:
876
+ return {"error": "HuggingFace integration not available"}
877
+
878
+ model_name = model_name or "SciBERT"
879
+ enhanced_prompt = f"Research {research_type} on: {topic}"
880
+ return await self.hf_tool.text_generation(
881
+ model_name=model_name, prompt=enhanced_prompt, **kwargs
882
+ )
883
+
884
+ def get_available_hf_models(self, category: Optional[str] = None) -> Dict[str, Any]:
885
+ """Get available models by category"""
886
+ if not self.hf_tool:
887
+ return {"error": "HuggingFace integration not available"}
888
+
889
+ return self.hf_tool.list_available_models(category=category)
app/agent/manus.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Optional
2
+
3
+ from pydantic import Field, model_validator
4
+
5
+ from app.agent.browser import BrowserContextHelper
6
+ from app.agent.toolcall import ToolCallAgent
7
+ from app.config import config
8
+ from app.logger import logger
9
+ from app.prompt.manus import NEXT_STEP_PROMPT, SYSTEM_PROMPT
10
+ from app.tool import Terminate, ToolCollection
11
+ from app.tool.ask_human import AskHuman
12
+ from app.tool.browser_use_tool import BrowserUseTool
13
+ from app.tool.mcp import MCPClients, MCPClientTool
14
+ from app.tool.python_execute import PythonExecute
15
+ from app.tool.str_replace_editor import StrReplaceEditor
16
+
17
+
18
+ class Manus(ToolCallAgent):
19
+ """A versatile general-purpose agent with support for both local and MCP tools."""
20
+
21
+ name: str = "Manus"
22
+ description: str = "A versatile agent that can solve various tasks using multiple tools including MCP-based tools"
23
+
24
+ system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)
25
+ next_step_prompt: str = NEXT_STEP_PROMPT
26
+
27
+ max_observe: int = 10000
28
+ max_steps: int = 20
29
+
30
+ # MCP clients for remote tool access
31
+ mcp_clients: MCPClients = Field(default_factory=MCPClients)
32
+
33
+ # Add general-purpose tools to the tool collection
34
+ available_tools: ToolCollection = Field(
35
+ default_factory=lambda: ToolCollection(
36
+ PythonExecute(),
37
+ BrowserUseTool(),
38
+ StrReplaceEditor(),
39
+ AskHuman(),
40
+ Terminate(),
41
+ )
42
+ )
43
+
44
+ special_tool_names: list[str] = Field(default_factory=lambda: [Terminate().name])
45
+ browser_context_helper: Optional[BrowserContextHelper] = None
46
+
47
+ # Track connected MCP servers
48
+ connected_servers: Dict[str, str] = Field(
49
+ default_factory=dict
50
+ ) # server_id -> url/command
51
+ _initialized: bool = False
52
+
53
+ @model_validator(mode="after")
54
+ def initialize_helper(self) -> "Manus":
55
+ """Initialize basic components synchronously."""
56
+ self.browser_context_helper = BrowserContextHelper(self)
57
+ return self
58
+
59
+ @classmethod
60
+ async def create(cls, **kwargs) -> "Manus":
61
+ """Factory method to create and properly initialize a Manus instance."""
62
+ instance = cls(**kwargs)
63
+ await instance.initialize_mcp_servers()
64
+ instance._initialized = True
65
+ return instance
66
+
67
+ async def initialize_mcp_servers(self) -> None:
68
+ """Initialize connections to configured MCP servers."""
69
+ for server_id, server_config in config.mcp_config.servers.items():
70
+ try:
71
+ if server_config.type == "sse":
72
+ if server_config.url:
73
+ await self.connect_mcp_server(server_config.url, server_id)
74
+ logger.info(
75
+ f"Connected to MCP server {server_id} at {server_config.url}"
76
+ )
77
+ elif server_config.type == "stdio":
78
+ if server_config.command:
79
+ await self.connect_mcp_server(
80
+ server_config.command,
81
+ server_id,
82
+ use_stdio=True,
83
+ stdio_args=server_config.args,
84
+ )
85
+ logger.info(
86
+ f"Connected to MCP server {server_id} using command {server_config.command}"
87
+ )
88
+ except Exception as e:
89
+ logger.error(f"Failed to connect to MCP server {server_id}: {e}")
90
+
91
+ async def connect_mcp_server(
92
+ self,
93
+ server_url: str,
94
+ server_id: str = "",
95
+ use_stdio: bool = False,
96
+ stdio_args: List[str] = None,
97
+ ) -> None:
98
+ """Connect to an MCP server and add its tools."""
99
+ if use_stdio:
100
+ await self.mcp_clients.connect_stdio(
101
+ server_url, stdio_args or [], server_id
102
+ )
103
+ self.connected_servers[server_id or server_url] = server_url
104
+ else:
105
+ await self.mcp_clients.connect_sse(server_url, server_id)
106
+ self.connected_servers[server_id or server_url] = server_url
107
+
108
+ # Update available tools with only the new tools from this server
109
+ new_tools = [
110
+ tool for tool in self.mcp_clients.tools if tool.server_id == server_id
111
+ ]
112
+ self.available_tools.add_tools(*new_tools)
113
+
114
+ async def disconnect_mcp_server(self, server_id: str = "") -> None:
115
+ """Disconnect from an MCP server and remove its tools."""
116
+ await self.mcp_clients.disconnect(server_id)
117
+ if server_id:
118
+ self.connected_servers.pop(server_id, None)
119
+ else:
120
+ self.connected_servers.clear()
121
+
122
+ # Rebuild available tools without the disconnected server's tools
123
+ base_tools = [
124
+ tool
125
+ for tool in self.available_tools.tools
126
+ if not isinstance(tool, MCPClientTool)
127
+ ]
128
+ self.available_tools = ToolCollection(*base_tools)
129
+ self.available_tools.add_tools(*self.mcp_clients.tools)
130
+
131
+ async def cleanup(self):
132
+ """Clean up Manus agent resources."""
133
+ if self.browser_context_helper:
134
+ await self.browser_context_helper.cleanup_browser()
135
+ # Disconnect from all MCP servers only if we were initialized
136
+ if self._initialized:
137
+ await self.disconnect_mcp_server()
138
+ self._initialized = False
139
+
140
+ async def think(self) -> bool:
141
+ """Process current state and decide next actions with appropriate context."""
142
+ if not self._initialized:
143
+ await self.initialize_mcp_servers()
144
+ self._initialized = True
145
+
146
+ original_prompt = self.next_step_prompt
147
+ recent_messages = self.memory.messages[-3:] if self.memory.messages else []
148
+ browser_in_use = any(
149
+ tc.function.name == BrowserUseTool().name
150
+ for msg in recent_messages
151
+ if msg.tool_calls
152
+ for tc in msg.tool_calls
153
+ )
154
+
155
+ if browser_in_use:
156
+ self.next_step_prompt = (
157
+ await self.browser_context_helper.format_next_step_prompt()
158
+ )
159
+
160
+ result = await super().think()
161
+
162
+ # Restore original prompt
163
+ self.next_step_prompt = original_prompt
164
+
165
+ return result
app/agent/mcp.py ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional, Tuple
2
+
3
+ from pydantic import Field
4
+
5
+ from app.agent.toolcall import ToolCallAgent
6
+ from app.logger import logger
7
+ from app.prompt.mcp import MULTIMEDIA_RESPONSE_PROMPT, NEXT_STEP_PROMPT, SYSTEM_PROMPT
8
+ from app.schema import AgentState, Message
9
+ from app.tool.base import ToolResult
10
+ from app.tool.mcp import MCPClients
11
+
12
+
13
+ class MCPAgent(ToolCallAgent):
14
+ """Agent for interacting with MCP (Model Context Protocol) servers.
15
+
16
+ This agent connects to an MCP server using either SSE or stdio transport
17
+ and makes the server's tools available through the agent's tool interface.
18
+ """
19
+
20
+ name: str = "mcp_agent"
21
+ description: str = "An agent that connects to an MCP server and uses its tools."
22
+
23
+ system_prompt: str = SYSTEM_PROMPT
24
+ next_step_prompt: str = NEXT_STEP_PROMPT
25
+
26
+ # Initialize MCP tool collection
27
+ mcp_clients: MCPClients = Field(default_factory=MCPClients)
28
+ available_tools: MCPClients = None # Will be set in initialize()
29
+
30
+ max_steps: int = 20
31
+ connection_type: str = "stdio" # "stdio" or "sse"
32
+
33
+ # Track tool schemas to detect changes
34
+ tool_schemas: Dict[str, Dict[str, Any]] = Field(default_factory=dict)
35
+ _refresh_tools_interval: int = 5 # Refresh tools every N steps
36
+
37
+ # Special tool names that should trigger termination
38
+ special_tool_names: List[str] = Field(default_factory=lambda: ["terminate"])
39
+
40
+ async def initialize(
41
+ self,
42
+ connection_type: Optional[str] = None,
43
+ server_url: Optional[str] = None,
44
+ command: Optional[str] = None,
45
+ args: Optional[List[str]] = None,
46
+ ) -> None:
47
+ """Initialize the MCP connection.
48
+
49
+ Args:
50
+ connection_type: Type of connection to use ("stdio" or "sse")
51
+ server_url: URL of the MCP server (for SSE connection)
52
+ command: Command to run (for stdio connection)
53
+ args: Arguments for the command (for stdio connection)
54
+ """
55
+ if connection_type:
56
+ self.connection_type = connection_type
57
+
58
+ # Connect to the MCP server based on connection type
59
+ if self.connection_type == "sse":
60
+ if not server_url:
61
+ raise ValueError("Server URL is required for SSE connection")
62
+ await self.mcp_clients.connect_sse(server_url=server_url)
63
+ elif self.connection_type == "stdio":
64
+ if not command:
65
+ raise ValueError("Command is required for stdio connection")
66
+ await self.mcp_clients.connect_stdio(command=command, args=args or [])
67
+ else:
68
+ raise ValueError(f"Unsupported connection type: {self.connection_type}")
69
+
70
+ # Set available_tools to our MCP instance
71
+ self.available_tools = self.mcp_clients
72
+
73
+ # Store initial tool schemas
74
+ await self._refresh_tools()
75
+
76
+ # Add system message about available tools
77
+ tool_names = list(self.mcp_clients.tool_map.keys())
78
+ tools_info = ", ".join(tool_names)
79
+
80
+ # Add system prompt and available tools information
81
+ self.memory.add_message(
82
+ Message.system_message(
83
+ f"{self.system_prompt}\n\nAvailable MCP tools: {tools_info}"
84
+ )
85
+ )
86
+
87
+ async def _refresh_tools(self) -> Tuple[List[str], List[str]]:
88
+ """Refresh the list of available tools from the MCP server.
89
+
90
+ Returns:
91
+ A tuple of (added_tools, removed_tools)
92
+ """
93
+ if not self.mcp_clients.sessions:
94
+ return [], []
95
+
96
+ # Get current tool schemas directly from the server
97
+ response = await self.mcp_clients.list_tools()
98
+ current_tools = {tool.name: tool.inputSchema for tool in response.tools}
99
+
100
+ # Determine added, removed, and changed tools
101
+ current_names = set(current_tools.keys())
102
+ previous_names = set(self.tool_schemas.keys())
103
+
104
+ added_tools = list(current_names - previous_names)
105
+ removed_tools = list(previous_names - current_names)
106
+
107
+ # Check for schema changes in existing tools
108
+ changed_tools = []
109
+ for name in current_names.intersection(previous_names):
110
+ if current_tools[name] != self.tool_schemas.get(name):
111
+ changed_tools.append(name)
112
+
113
+ # Update stored schemas
114
+ self.tool_schemas = current_tools
115
+
116
+ # Log and notify about changes
117
+ if added_tools:
118
+ logger.info(f"Added MCP tools: {added_tools}")
119
+ self.memory.add_message(
120
+ Message.system_message(f"New tools available: {', '.join(added_tools)}")
121
+ )
122
+ if removed_tools:
123
+ logger.info(f"Removed MCP tools: {removed_tools}")
124
+ self.memory.add_message(
125
+ Message.system_message(
126
+ f"Tools no longer available: {', '.join(removed_tools)}"
127
+ )
128
+ )
129
+ if changed_tools:
130
+ logger.info(f"Changed MCP tools: {changed_tools}")
131
+
132
+ return added_tools, removed_tools
133
+
134
+ async def think(self) -> bool:
135
+ """Process current state and decide next action."""
136
+ # Check MCP session and tools availability
137
+ if not self.mcp_clients.sessions or not self.mcp_clients.tool_map:
138
+ logger.info("MCP service is no longer available, ending interaction")
139
+ self.state = AgentState.FINISHED
140
+ return False
141
+
142
+ # Refresh tools periodically
143
+ if self.current_step % self._refresh_tools_interval == 0:
144
+ await self._refresh_tools()
145
+ # All tools removed indicates shutdown
146
+ if not self.mcp_clients.tool_map:
147
+ logger.info("MCP service has shut down, ending interaction")
148
+ self.state = AgentState.FINISHED
149
+ return False
150
+
151
+ # Use the parent class's think method
152
+ return await super().think()
153
+
154
+ async def _handle_special_tool(self, name: str, result: Any, **kwargs) -> None:
155
+ """Handle special tool execution and state changes"""
156
+ # First process with parent handler
157
+ await super()._handle_special_tool(name, result, **kwargs)
158
+
159
+ # Handle multimedia responses
160
+ if isinstance(result, ToolResult) and result.base64_image:
161
+ self.memory.add_message(
162
+ Message.system_message(
163
+ MULTIMEDIA_RESPONSE_PROMPT.format(tool_name=name)
164
+ )
165
+ )
166
+
167
+ def _should_finish_execution(self, name: str, **kwargs) -> bool:
168
+ """Determine if tool execution should finish the agent"""
169
+ # Terminate if the tool name is 'terminate'
170
+ return name.lower() == "terminate"
171
+
172
+ async def cleanup(self) -> None:
173
+ """Clean up MCP connection when done."""
174
+ if self.mcp_clients.sessions:
175
+ await self.mcp_clients.disconnect()
176
+ logger.info("MCP connection closed")
177
+
178
+ async def run(self, request: Optional[str] = None) -> str:
179
+ """Run the agent with cleanup when done."""
180
+ try:
181
+ result = await super().run(request)
182
+ return result
183
+ finally:
184
+ # Ensure cleanup happens even if there's an error
185
+ await self.cleanup()
app/agent/react.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import ABC, abstractmethod
2
+ from typing import Optional
3
+
4
+ from pydantic import Field
5
+
6
+ from app.agent.base import BaseAgent
7
+ from app.llm import LLM
8
+ from app.schema import AgentState, Memory
9
+
10
+
11
+ class ReActAgent(BaseAgent, ABC):
12
+ name: str
13
+ description: Optional[str] = None
14
+
15
+ system_prompt: Optional[str] = None
16
+ next_step_prompt: Optional[str] = None
17
+
18
+ llm: Optional[LLM] = Field(default_factory=LLM)
19
+ memory: Memory = Field(default_factory=Memory)
20
+ state: AgentState = AgentState.IDLE
21
+
22
+ max_steps: int = 10
23
+ current_step: int = 0
24
+
25
+ @abstractmethod
26
+ async def think(self) -> bool:
27
+ """Process current state and decide next action"""
28
+
29
+ @abstractmethod
30
+ async def act(self) -> str:
31
+ """Execute decided actions"""
32
+
33
+ async def step(self) -> str:
34
+ """Execute a single step: think and act."""
35
+ should_act = await self.think()
36
+ if not should_act:
37
+ return "Thinking complete - no action needed"
38
+ return await self.act()
app/agent/sandbox_agent.py ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Optional
2
+
3
+ from pydantic import Field, model_validator
4
+
5
+ from app.agent.browser import BrowserContextHelper
6
+ from app.agent.toolcall import ToolCallAgent
7
+ from app.config import config
8
+ from app.daytona.sandbox import create_sandbox, delete_sandbox
9
+ from app.daytona.tool_base import SandboxToolsBase
10
+ from app.logger import logger
11
+ from app.prompt.manus import NEXT_STEP_PROMPT, SYSTEM_PROMPT
12
+ from app.tool import Terminate, ToolCollection
13
+ from app.tool.ask_human import AskHuman
14
+ from app.tool.mcp import MCPClients, MCPClientTool
15
+ from app.tool.sandbox.sb_browser_tool import SandboxBrowserTool
16
+ from app.tool.sandbox.sb_files_tool import SandboxFilesTool
17
+ from app.tool.sandbox.sb_shell_tool import SandboxShellTool
18
+ from app.tool.sandbox.sb_vision_tool import SandboxVisionTool
19
+
20
+
21
+ class SandboxManus(ToolCallAgent):
22
+ """A versatile general-purpose agent with support for both local and MCP tools."""
23
+
24
+ name: str = "SandboxManus"
25
+ description: str = "A versatile agent that can solve various tasks using multiple sandbox-tools including MCP-based tools"
26
+
27
+ system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)
28
+ next_step_prompt: str = NEXT_STEP_PROMPT
29
+
30
+ max_observe: int = 10000
31
+ max_steps: int = 20
32
+
33
+ # MCP clients for remote tool access
34
+ mcp_clients: MCPClients = Field(default_factory=MCPClients)
35
+
36
+ # Add general-purpose tools to the tool collection
37
+ available_tools: ToolCollection = Field(
38
+ default_factory=lambda: ToolCollection(
39
+ # PythonExecute(),
40
+ # BrowserUseTool(),
41
+ # StrReplaceEditor(),
42
+ AskHuman(),
43
+ Terminate(),
44
+ )
45
+ )
46
+
47
+ special_tool_names: list[str] = Field(default_factory=lambda: [Terminate().name])
48
+ browser_context_helper: Optional[BrowserContextHelper] = None
49
+
50
+ # Track connected MCP servers
51
+ connected_servers: Dict[str, str] = Field(
52
+ default_factory=dict
53
+ ) # server_id -> url/command
54
+ _initialized: bool = False
55
+ sandbox_link: Optional[dict[str, dict[str, str]]] = Field(default_factory=dict)
56
+
57
+ @model_validator(mode="after")
58
+ def initialize_helper(self) -> "SandboxManus":
59
+ """Initialize basic components synchronously."""
60
+ self.browser_context_helper = BrowserContextHelper(self)
61
+ return self
62
+
63
+ @classmethod
64
+ async def create(cls, **kwargs) -> "SandboxManus":
65
+ """Factory method to create and properly initialize a Manus instance."""
66
+ instance = cls(**kwargs)
67
+ await instance.initialize_mcp_servers()
68
+ await instance.initialize_sandbox_tools()
69
+ instance._initialized = True
70
+ return instance
71
+
72
+ async def initialize_sandbox_tools(
73
+ self,
74
+ password: str = config.daytona.VNC_password,
75
+ ) -> None:
76
+ try:
77
+ # 创建新沙箱
78
+ if password:
79
+ sandbox = create_sandbox(password=password)
80
+ self.sandbox = sandbox
81
+ else:
82
+ raise ValueError("password must be provided")
83
+ vnc_link = sandbox.get_preview_link(6080)
84
+ website_link = sandbox.get_preview_link(8080)
85
+ vnc_url = vnc_link.url if hasattr(vnc_link, "url") else str(vnc_link)
86
+ website_url = (
87
+ website_link.url if hasattr(website_link, "url") else str(website_link)
88
+ )
89
+
90
+ # Get the actual sandbox_id from the created sandbox
91
+ actual_sandbox_id = sandbox.id if hasattr(sandbox, "id") else "new_sandbox"
92
+ if not self.sandbox_link:
93
+ self.sandbox_link = {}
94
+ self.sandbox_link[actual_sandbox_id] = {
95
+ "vnc": vnc_url,
96
+ "website": website_url,
97
+ }
98
+ logger.info(f"VNC URL: {vnc_url}")
99
+ logger.info(f"Website URL: {website_url}")
100
+ SandboxToolsBase._urls_printed = True
101
+ sb_tools = [
102
+ SandboxBrowserTool(sandbox),
103
+ SandboxFilesTool(sandbox),
104
+ SandboxShellTool(sandbox),
105
+ SandboxVisionTool(sandbox),
106
+ ]
107
+ self.available_tools.add_tools(*sb_tools)
108
+
109
+ except Exception as e:
110
+ logger.error(f"Error initializing sandbox tools: {e}")
111
+ raise
112
+
113
+ async def initialize_mcp_servers(self) -> None:
114
+ """Initialize connections to configured MCP servers."""
115
+ for server_id, server_config in config.mcp_config.servers.items():
116
+ try:
117
+ if server_config.type == "sse":
118
+ if server_config.url:
119
+ await self.connect_mcp_server(server_config.url, server_id)
120
+ logger.info(
121
+ f"Connected to MCP server {server_id} at {server_config.url}"
122
+ )
123
+ elif server_config.type == "stdio":
124
+ if server_config.command:
125
+ await self.connect_mcp_server(
126
+ server_config.command,
127
+ server_id,
128
+ use_stdio=True,
129
+ stdio_args=server_config.args,
130
+ )
131
+ logger.info(
132
+ f"Connected to MCP server {server_id} using command {server_config.command}"
133
+ )
134
+ except Exception as e:
135
+ logger.error(f"Failed to connect to MCP server {server_id}: {e}")
136
+
137
+ async def connect_mcp_server(
138
+ self,
139
+ server_url: str,
140
+ server_id: str = "",
141
+ use_stdio: bool = False,
142
+ stdio_args: List[str] = None,
143
+ ) -> None:
144
+ """Connect to an MCP server and add its tools."""
145
+ if use_stdio:
146
+ await self.mcp_clients.connect_stdio(
147
+ server_url, stdio_args or [], server_id
148
+ )
149
+ self.connected_servers[server_id or server_url] = server_url
150
+ else:
151
+ await self.mcp_clients.connect_sse(server_url, server_id)
152
+ self.connected_servers[server_id or server_url] = server_url
153
+
154
+ # Update available tools with only the new tools from this server
155
+ new_tools = [
156
+ tool for tool in self.mcp_clients.tools if tool.server_id == server_id
157
+ ]
158
+ self.available_tools.add_tools(*new_tools)
159
+
160
+ async def disconnect_mcp_server(self, server_id: str = "") -> None:
161
+ """Disconnect from an MCP server and remove its tools."""
162
+ await self.mcp_clients.disconnect(server_id)
163
+ if server_id:
164
+ self.connected_servers.pop(server_id, None)
165
+ else:
166
+ self.connected_servers.clear()
167
+
168
+ # Rebuild available tools without the disconnected server's tools
169
+ base_tools = [
170
+ tool
171
+ for tool in self.available_tools.tools
172
+ if not isinstance(tool, MCPClientTool)
173
+ ]
174
+ self.available_tools = ToolCollection(*base_tools)
175
+ self.available_tools.add_tools(*self.mcp_clients.tools)
176
+
177
+ async def delete_sandbox(self, sandbox_id: str) -> None:
178
+ """Delete a sandbox by ID."""
179
+ try:
180
+ await delete_sandbox(sandbox_id)
181
+ logger.info(f"Sandbox {sandbox_id} deleted successfully")
182
+ if sandbox_id in self.sandbox_link:
183
+ del self.sandbox_link[sandbox_id]
184
+ except Exception as e:
185
+ logger.error(f"Error deleting sandbox {sandbox_id}: {e}")
186
+ raise e
187
+
188
+ async def cleanup(self):
189
+ """Clean up Manus agent resources."""
190
+ if self.browser_context_helper:
191
+ await self.browser_context_helper.cleanup_browser()
192
+ # Disconnect from all MCP servers only if we were initialized
193
+ if self._initialized:
194
+ await self.disconnect_mcp_server()
195
+ await self.delete_sandbox(self.sandbox.id if self.sandbox else "unknown")
196
+ self._initialized = False
197
+
198
+ async def think(self) -> bool:
199
+ """Process current state and decide next actions with appropriate context."""
200
+ if not self._initialized:
201
+ await self.initialize_mcp_servers()
202
+ self._initialized = True
203
+
204
+ original_prompt = self.next_step_prompt
205
+ recent_messages = self.memory.messages[-3:] if self.memory.messages else []
206
+ browser_in_use = any(
207
+ tc.function.name == SandboxBrowserTool().name
208
+ for msg in recent_messages
209
+ if msg.tool_calls
210
+ for tc in msg.tool_calls
211
+ )
212
+
213
+ if browser_in_use:
214
+ self.next_step_prompt = (
215
+ await self.browser_context_helper.format_next_step_prompt()
216
+ )
217
+
218
+ result = await super().think()
219
+
220
+ # Restore original prompt
221
+ self.next_step_prompt = original_prompt
222
+
223
+ return result
app/agent/swe.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+
3
+ from pydantic import Field
4
+
5
+ from app.agent.toolcall import ToolCallAgent
6
+ from app.prompt.swe import SYSTEM_PROMPT
7
+ from app.tool import Bash, StrReplaceEditor, Terminate, ToolCollection
8
+
9
+
10
+ class SWEAgent(ToolCallAgent):
11
+ """An agent that implements the SWEAgent paradigm for executing code and natural conversations."""
12
+
13
+ name: str = "swe"
14
+ description: str = "an autonomous AI programmer that interacts directly with the computer to solve tasks."
15
+
16
+ system_prompt: str = SYSTEM_PROMPT
17
+ next_step_prompt: str = ""
18
+
19
+ available_tools: ToolCollection = ToolCollection(
20
+ Bash(), StrReplaceEditor(), Terminate()
21
+ )
22
+ special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
23
+
24
+ max_steps: int = 20
app/agent/toolcall.py ADDED
@@ -0,0 +1,250 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import asyncio
2
+ import json
3
+ from typing import Any, List, Optional, Union
4
+
5
+ from pydantic import Field
6
+
7
+ from app.agent.react import ReActAgent
8
+ from app.exceptions import TokenLimitExceeded
9
+ from app.logger import logger
10
+ from app.prompt.toolcall import NEXT_STEP_PROMPT, SYSTEM_PROMPT
11
+ from app.schema import TOOL_CHOICE_TYPE, AgentState, Message, ToolCall, ToolChoice
12
+ from app.tool import CreateChatCompletion, Terminate, ToolCollection
13
+
14
+
15
+ TOOL_CALL_REQUIRED = "Tool calls required but none provided"
16
+
17
+
18
+ class ToolCallAgent(ReActAgent):
19
+ """Base agent class for handling tool/function calls with enhanced abstraction"""
20
+
21
+ name: str = "toolcall"
22
+ description: str = "an agent that can execute tool calls."
23
+
24
+ system_prompt: str = SYSTEM_PROMPT
25
+ next_step_prompt: str = NEXT_STEP_PROMPT
26
+
27
+ available_tools: ToolCollection = ToolCollection(
28
+ CreateChatCompletion(), Terminate()
29
+ )
30
+ tool_choices: TOOL_CHOICE_TYPE = ToolChoice.AUTO # type: ignore
31
+ special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
32
+
33
+ tool_calls: List[ToolCall] = Field(default_factory=list)
34
+ _current_base64_image: Optional[str] = None
35
+
36
+ max_steps: int = 30
37
+ max_observe: Optional[Union[int, bool]] = None
38
+
39
+ async def think(self) -> bool:
40
+ """Process current state and decide next actions using tools"""
41
+ if self.next_step_prompt:
42
+ user_msg = Message.user_message(self.next_step_prompt)
43
+ self.messages += [user_msg]
44
+
45
+ try:
46
+ # Get response with tool options
47
+ response = await self.llm.ask_tool(
48
+ messages=self.messages,
49
+ system_msgs=(
50
+ [Message.system_message(self.system_prompt)]
51
+ if self.system_prompt
52
+ else None
53
+ ),
54
+ tools=self.available_tools.to_params(),
55
+ tool_choice=self.tool_choices,
56
+ )
57
+ except ValueError:
58
+ raise
59
+ except Exception as e:
60
+ # Check if this is a RetryError containing TokenLimitExceeded
61
+ if hasattr(e, "__cause__") and isinstance(e.__cause__, TokenLimitExceeded):
62
+ token_limit_error = e.__cause__
63
+ logger.error(
64
+ f"🚨 Token limit error (from RetryError): {token_limit_error}"
65
+ )
66
+ self.memory.add_message(
67
+ Message.assistant_message(
68
+ f"Maximum token limit reached, cannot continue execution: {str(token_limit_error)}"
69
+ )
70
+ )
71
+ self.state = AgentState.FINISHED
72
+ return False
73
+ raise
74
+
75
+ self.tool_calls = tool_calls = (
76
+ response.tool_calls if response and response.tool_calls else []
77
+ )
78
+ content = response.content if response and response.content else ""
79
+
80
+ # Log response info
81
+ logger.info(f"✨ {self.name}'s thoughts: {content}")
82
+ logger.info(
83
+ f"🛠️ {self.name} selected {len(tool_calls) if tool_calls else 0} tools to use"
84
+ )
85
+ if tool_calls:
86
+ logger.info(
87
+ f"🧰 Tools being prepared: {[call.function.name for call in tool_calls]}"
88
+ )
89
+ logger.info(f"🔧 Tool arguments: {tool_calls[0].function.arguments}")
90
+
91
+ try:
92
+ if response is None:
93
+ raise RuntimeError("No response received from the LLM")
94
+
95
+ # Handle different tool_choices modes
96
+ if self.tool_choices == ToolChoice.NONE:
97
+ if tool_calls:
98
+ logger.warning(
99
+ f"🤔 Hmm, {self.name} tried to use tools when they weren't available!"
100
+ )
101
+ if content:
102
+ self.memory.add_message(Message.assistant_message(content))
103
+ return True
104
+ return False
105
+
106
+ # Create and add assistant message
107
+ assistant_msg = (
108
+ Message.from_tool_calls(content=content, tool_calls=self.tool_calls)
109
+ if self.tool_calls
110
+ else Message.assistant_message(content)
111
+ )
112
+ self.memory.add_message(assistant_msg)
113
+
114
+ if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls:
115
+ return True # Will be handled in act()
116
+
117
+ # For 'auto' mode, continue with content if no commands but content exists
118
+ if self.tool_choices == ToolChoice.AUTO and not self.tool_calls:
119
+ return bool(content)
120
+
121
+ return bool(self.tool_calls)
122
+ except Exception as e:
123
+ logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}")
124
+ self.memory.add_message(
125
+ Message.assistant_message(
126
+ f"Error encountered while processing: {str(e)}"
127
+ )
128
+ )
129
+ return False
130
+
131
+ async def act(self) -> str:
132
+ """Execute tool calls and handle their results"""
133
+ if not self.tool_calls:
134
+ if self.tool_choices == ToolChoice.REQUIRED:
135
+ raise ValueError(TOOL_CALL_REQUIRED)
136
+
137
+ # Return last message content if no tool calls
138
+ return self.messages[-1].content or "No content or commands to execute"
139
+
140
+ results = []
141
+ for command in self.tool_calls:
142
+ # Reset base64_image for each tool call
143
+ self._current_base64_image = None
144
+
145
+ result = await self.execute_tool(command)
146
+
147
+ if self.max_observe:
148
+ result = result[: self.max_observe]
149
+
150
+ logger.info(
151
+ f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}"
152
+ )
153
+
154
+ # Add tool response to memory
155
+ tool_msg = Message.tool_message(
156
+ content=result,
157
+ tool_call_id=command.id,
158
+ name=command.function.name,
159
+ base64_image=self._current_base64_image,
160
+ )
161
+ self.memory.add_message(tool_msg)
162
+ results.append(result)
163
+
164
+ return "\n\n".join(results)
165
+
166
+ async def execute_tool(self, command: ToolCall) -> str:
167
+ """Execute a single tool call with robust error handling"""
168
+ if not command or not command.function or not command.function.name:
169
+ return "Error: Invalid command format"
170
+
171
+ name = command.function.name
172
+ if name not in self.available_tools.tool_map:
173
+ return f"Error: Unknown tool '{name}'"
174
+
175
+ try:
176
+ # Parse arguments
177
+ args = json.loads(command.function.arguments or "{}")
178
+
179
+ # Execute the tool
180
+ logger.info(f"🔧 Activating tool: '{name}'...")
181
+ result = await self.available_tools.execute(name=name, tool_input=args)
182
+
183
+ # Handle special tools
184
+ await self._handle_special_tool(name=name, result=result)
185
+
186
+ # Check if result is a ToolResult with base64_image
187
+ if hasattr(result, "base64_image") and result.base64_image:
188
+ # Store the base64_image for later use in tool_message
189
+ self._current_base64_image = result.base64_image
190
+
191
+ # Format result for display (standard case)
192
+ observation = (
193
+ f"Observed output of cmd `{name}` executed:\n{str(result)}"
194
+ if result
195
+ else f"Cmd `{name}` completed with no output"
196
+ )
197
+
198
+ return observation
199
+ except json.JSONDecodeError:
200
+ error_msg = f"Error parsing arguments for {name}: Invalid JSON format"
201
+ logger.error(
202
+ f"📝 Oops! The arguments for '{name}' don't make sense - invalid JSON, arguments:{command.function.arguments}"
203
+ )
204
+ return f"Error: {error_msg}"
205
+ except Exception as e:
206
+ error_msg = f"⚠️ Tool '{name}' encountered a problem: {str(e)}"
207
+ logger.exception(error_msg)
208
+ return f"Error: {error_msg}"
209
+
210
+ async def _handle_special_tool(self, name: str, result: Any, **kwargs):
211
+ """Handle special tool execution and state changes"""
212
+ if not self._is_special_tool(name):
213
+ return
214
+
215
+ if self._should_finish_execution(name=name, result=result, **kwargs):
216
+ # Set agent state to finished
217
+ logger.info(f"🏁 Special tool '{name}' has completed the task!")
218
+ self.state = AgentState.FINISHED
219
+
220
+ @staticmethod
221
+ def _should_finish_execution(**kwargs) -> bool:
222
+ """Determine if tool execution should finish the agent"""
223
+ return True
224
+
225
+ def _is_special_tool(self, name: str) -> bool:
226
+ """Check if tool name is in special tools list"""
227
+ return name.lower() in [n.lower() for n in self.special_tool_names]
228
+
229
+ async def cleanup(self):
230
+ """Clean up resources used by the agent's tools."""
231
+ logger.info(f"🧹 Cleaning up resources for agent '{self.name}'...")
232
+ for tool_name, tool_instance in self.available_tools.tool_map.items():
233
+ if hasattr(tool_instance, "cleanup") and asyncio.iscoroutinefunction(
234
+ tool_instance.cleanup
235
+ ):
236
+ try:
237
+ logger.debug(f"🧼 Cleaning up tool: {tool_name}")
238
+ await tool_instance.cleanup()
239
+ except Exception as e:
240
+ logger.error(
241
+ f"🚨 Error cleaning up tool '{tool_name}': {e}", exc_info=True
242
+ )
243
+ logger.info(f"✨ Cleanup complete for agent '{self.name}'.")
244
+
245
+ async def run(self, request: Optional[str] = None) -> str:
246
+ """Run the agent with cleanup when done."""
247
+ try:
248
+ return await super().run(request)
249
+ finally:
250
+ await self.cleanup()
app/auth.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ User authentication models and validation for OpenManus
3
+ Mobile number + password based authentication system
4
+ """
5
+
6
+ import hashlib
7
+ import re
8
+ import secrets
9
+ from datetime import datetime, timedelta
10
+ from typing import Optional
11
+ from dataclasses import dataclass
12
+ from pydantic import BaseModel, validator
13
+
14
+
15
+ class UserSignupRequest(BaseModel):
16
+ """User signup request model"""
17
+
18
+ full_name: str
19
+ mobile_number: str
20
+ password: str
21
+ confirm_password: str
22
+
23
+ @validator("full_name")
24
+ def validate_full_name(cls, v):
25
+ if not v or len(v.strip()) < 2:
26
+ raise ValueError("Full name must be at least 2 characters long")
27
+ if len(v.strip()) > 100:
28
+ raise ValueError("Full name must be less than 100 characters")
29
+ return v.strip()
30
+
31
+ @validator("mobile_number")
32
+ def validate_mobile_number(cls, v):
33
+ # Remove all non-digit characters
34
+ digits_only = re.sub(r"\D", "", v)
35
+
36
+ # Check if it's a valid mobile number (10-15 digits)
37
+ if len(digits_only) < 10 or len(digits_only) > 15:
38
+ raise ValueError("Mobile number must be between 10-15 digits")
39
+
40
+ # Ensure it starts with country code or local format
41
+ if not re.match(r"^(\+?[1-9]\d{9,14})$", digits_only):
42
+ raise ValueError("Invalid mobile number format")
43
+
44
+ return digits_only
45
+
46
+ @validator("password")
47
+ def validate_password(cls, v):
48
+ if len(v) < 8:
49
+ raise ValueError("Password must be at least 8 characters long")
50
+ if len(v) > 128:
51
+ raise ValueError("Password must be less than 128 characters")
52
+
53
+ # Check for at least one uppercase, lowercase, and digit
54
+ if not re.search(r"[A-Z]", v):
55
+ raise ValueError("Password must contain at least one uppercase letter")
56
+ if not re.search(r"[a-z]", v):
57
+ raise ValueError("Password must contain at least one lowercase letter")
58
+ if not re.search(r"\d", v):
59
+ raise ValueError("Password must contain at least one digit")
60
+
61
+ return v
62
+
63
+ @validator("confirm_password")
64
+ def validate_confirm_password(cls, v, values):
65
+ if "password" in values and v != values["password"]:
66
+ raise ValueError("Passwords do not match")
67
+ return v
68
+
69
+
70
+ class UserLoginRequest(BaseModel):
71
+ """User login request model"""
72
+
73
+ mobile_number: str
74
+ password: str
75
+
76
+ @validator("mobile_number")
77
+ def validate_mobile_number(cls, v):
78
+ # Remove all non-digit characters
79
+ digits_only = re.sub(r"\D", "", v)
80
+
81
+ if len(digits_only) < 10 or len(digits_only) > 15:
82
+ raise ValueError("Invalid mobile number")
83
+
84
+ return digits_only
85
+
86
+
87
+ @dataclass
88
+ class User:
89
+ """User model"""
90
+
91
+ id: str
92
+ mobile_number: str
93
+ full_name: str
94
+ password_hash: str
95
+ avatar_url: Optional[str] = None
96
+ preferences: Optional[str] = None
97
+ is_active: bool = True
98
+ created_at: Optional[datetime] = None
99
+ updated_at: Optional[datetime] = None
100
+
101
+
102
+ @dataclass
103
+ class UserSession:
104
+ """User session model"""
105
+
106
+ session_id: str
107
+ user_id: str
108
+ mobile_number: str
109
+ full_name: str
110
+ created_at: datetime
111
+ expires_at: datetime
112
+
113
+ @property
114
+ def is_valid(self) -> bool:
115
+ """Check if session is still valid"""
116
+ return datetime.utcnow() < self.expires_at
117
+
118
+
119
+ class UserAuth:
120
+ """User authentication utilities"""
121
+
122
+ @staticmethod
123
+ def hash_password(password: str) -> str:
124
+ """Hash password using SHA-256 with salt"""
125
+ salt = secrets.token_hex(32)
126
+ password_hash = hashlib.sha256((password + salt).encode()).hexdigest()
127
+ return f"{salt}:{password_hash}"
128
+
129
+ @staticmethod
130
+ def verify_password(password: str, password_hash: str) -> bool:
131
+ """Verify password against stored hash"""
132
+ try:
133
+ salt, stored_hash = password_hash.split(":")
134
+ password_hash_check = hashlib.sha256((password + salt).encode()).hexdigest()
135
+ return password_hash_check == stored_hash
136
+ except ValueError:
137
+ return False
138
+
139
+ @staticmethod
140
+ def generate_session_id() -> str:
141
+ """Generate secure session ID"""
142
+ return secrets.token_urlsafe(32)
143
+
144
+ @staticmethod
145
+ def generate_user_id() -> str:
146
+ """Generate unique user ID"""
147
+ return f"user_{secrets.token_hex(16)}"
148
+
149
+ @staticmethod
150
+ def format_mobile_number(mobile_number: str) -> str:
151
+ """Format mobile number for consistent storage"""
152
+ # Remove all non-digit characters
153
+ digits_only = re.sub(r"\D", "", mobile_number)
154
+
155
+ # Add + prefix if not present and format consistently
156
+ if not digits_only.startswith("+"):
157
+ # Assume it's a local number, add default country code if needed
158
+ if len(digits_only) == 10: # US format
159
+ digits_only = f"1{digits_only}"
160
+
161
+ return f"+{digits_only}"
162
+
163
+ @staticmethod
164
+ def create_session(user: User, duration_hours: int = 24) -> UserSession:
165
+ """Create a new user session"""
166
+ session_id = UserAuth.generate_session_id()
167
+ created_at = datetime.utcnow()
168
+ expires_at = created_at + timedelta(hours=duration_hours)
169
+
170
+ return UserSession(
171
+ session_id=session_id,
172
+ user_id=user.id,
173
+ mobile_number=user.mobile_number,
174
+ full_name=user.full_name,
175
+ created_at=created_at,
176
+ expires_at=expires_at,
177
+ )
178
+
179
+
180
+ # Response models
181
+ class AuthResponse(BaseModel):
182
+ """Authentication response model"""
183
+
184
+ success: bool
185
+ message: str
186
+ session_id: Optional[str] = None
187
+ user_id: Optional[str] = None
188
+ full_name: Optional[str] = None
189
+
190
+
191
+ class UserProfile(BaseModel):
192
+ """User profile response model"""
193
+
194
+ user_id: str
195
+ full_name: str
196
+ mobile_number: str # Masked for security
197
+ avatar_url: Optional[str] = None
198
+ created_at: Optional[str] = None
199
+
200
+ @staticmethod
201
+ def mask_mobile_number(mobile_number: str) -> str:
202
+ """Mask mobile number for security (show only last 4 digits)"""
203
+ if len(mobile_number) <= 4:
204
+ return "*" * len(mobile_number)
205
+ return "*" * (len(mobile_number) - 4) + mobile_number[-4:]
app/auth_interface.py ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Authentication Web Interface for OpenManus
3
+ Mobile number + password based authentication forms
4
+ """
5
+
6
+ import asyncio
7
+ import sqlite3
8
+ from typing import Optional, Tuple
9
+
10
+ import gradio as gr
11
+
12
+ from app.auth import UserSignupRequest, UserLoginRequest
13
+ from app.auth_service import AuthService
14
+ from app.logger import logger
15
+
16
+
17
+ class AuthInterface:
18
+ """Authentication interface with Gradio"""
19
+
20
+ def __init__(self, db_path: str = "openmanus.db"):
21
+ self.db_path = db_path
22
+ self.auth_service = None
23
+ self.current_session = None
24
+ self.init_database()
25
+
26
+ def init_database(self):
27
+ """Initialize database with schema"""
28
+ try:
29
+ conn = sqlite3.connect(self.db_path)
30
+
31
+ # Create users table with mobile auth
32
+ conn.execute(
33
+ """
34
+ CREATE TABLE IF NOT EXISTS users (
35
+ id TEXT PRIMARY KEY,
36
+ mobile_number TEXT UNIQUE NOT NULL,
37
+ full_name TEXT NOT NULL,
38
+ password_hash TEXT NOT NULL,
39
+ avatar_url TEXT,
40
+ preferences TEXT,
41
+ is_active BOOLEAN DEFAULT TRUE,
42
+ created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
43
+ updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
44
+ )
45
+ """
46
+ )
47
+
48
+ # Create sessions table
49
+ conn.execute(
50
+ """
51
+ CREATE TABLE IF NOT EXISTS sessions (
52
+ id TEXT PRIMARY KEY,
53
+ user_id TEXT NOT NULL,
54
+ title TEXT,
55
+ metadata TEXT,
56
+ created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
57
+ updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
58
+ expires_at DATETIME,
59
+ FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
60
+ )
61
+ """
62
+ )
63
+
64
+ conn.commit()
65
+ conn.close()
66
+ logger.info("Database initialized successfully")
67
+
68
+ except Exception as e:
69
+ logger.error(f"Database initialization error: {str(e)}")
70
+
71
+ def get_db_connection(self):
72
+ """Get database connection"""
73
+ return sqlite3.connect(self.db_path)
74
+
75
+ async def handle_signup(
76
+ self, full_name: str, mobile_number: str, password: str, confirm_password: str
77
+ ) -> Tuple[str, bool, dict]:
78
+ """Handle user signup"""
79
+ try:
80
+ # Validate input
81
+ if not all([full_name, mobile_number, password, confirm_password]):
82
+ return "All fields are required", False, gr.update(visible=True)
83
+
84
+ # Create signup request
85
+ signup_data = UserSignupRequest(
86
+ full_name=full_name,
87
+ mobile_number=mobile_number,
88
+ password=password,
89
+ confirm_password=confirm_password,
90
+ )
91
+
92
+ # Process signup
93
+ db_conn = self.get_db_connection()
94
+ auth_service = AuthService(db_conn)
95
+
96
+ result = await auth_service.register_user(signup_data)
97
+ db_conn.close()
98
+
99
+ if result.success:
100
+ self.current_session = {
101
+ "session_id": result.session_id,
102
+ "user_id": result.user_id,
103
+ "full_name": result.full_name,
104
+ }
105
+ return (
106
+ f"Welcome {result.full_name}! Account created successfully.",
107
+ True,
108
+ gr.update(visible=False),
109
+ )
110
+ else:
111
+ return result.message, False, gr.update(visible=True)
112
+
113
+ except ValueError as e:
114
+ return str(e), False, gr.update(visible=True)
115
+ except Exception as e:
116
+ logger.error(f"Signup error: {str(e)}")
117
+ return "An error occurred during signup", False, gr.update(visible=True)
118
+
119
+ async def handle_login(
120
+ self, mobile_number: str, password: str
121
+ ) -> Tuple[str, bool, dict]:
122
+ """Handle user login"""
123
+ try:
124
+ # Validate input
125
+ if not all([mobile_number, password]):
126
+ return (
127
+ "Mobile number and password are required",
128
+ False,
129
+ gr.update(visible=True),
130
+ )
131
+
132
+ # Create login request
133
+ login_data = UserLoginRequest(
134
+ mobile_number=mobile_number, password=password
135
+ )
136
+
137
+ # Process login
138
+ db_conn = self.get_db_connection()
139
+ auth_service = AuthService(db_conn)
140
+
141
+ result = await auth_service.login_user(login_data)
142
+ db_conn.close()
143
+
144
+ if result.success:
145
+ self.current_session = {
146
+ "session_id": result.session_id,
147
+ "user_id": result.user_id,
148
+ "full_name": result.full_name,
149
+ }
150
+ return (
151
+ f"Welcome back, {result.full_name}!",
152
+ True,
153
+ gr.update(visible=False),
154
+ )
155
+ else:
156
+ return result.message, False, gr.update(visible=True)
157
+
158
+ except ValueError as e:
159
+ return str(e), False, gr.update(visible=True)
160
+ except Exception as e:
161
+ logger.error(f"Login error: {str(e)}")
162
+ return "An error occurred during login", False, gr.update(visible=True)
163
+
164
+ def handle_logout(self) -> Tuple[str, bool, dict]:
165
+ """Handle user logout"""
166
+ if self.current_session:
167
+ # In a real app, you'd delete the session from database
168
+ self.current_session = None
169
+
170
+ return "Logged out successfully", False, gr.update(visible=True)
171
+
172
+ def create_interface(self) -> gr.Interface:
173
+ """Create the authentication interface"""
174
+
175
+ with gr.Blocks(
176
+ title="OpenManus Authentication", theme=gr.themes.Soft()
177
+ ) as auth_interface:
178
+ gr.Markdown(
179
+ """
180
+ # 🔐 OpenManus Authentication
181
+ ### Secure Mobile Number + Password Login System
182
+ """
183
+ )
184
+
185
+ # Session status
186
+ session_status = gr.Textbox(
187
+ value="Not logged in", label="Status", interactive=False
188
+ )
189
+
190
+ # Auth forms container
191
+ with gr.Column(visible=True) as auth_forms:
192
+
193
+ with gr.Tabs():
194
+
195
+ # Login Tab
196
+ with gr.TabItem("🔑 Login"):
197
+ gr.Markdown("### Login with your mobile number and password")
198
+
199
+ login_mobile = gr.Textbox(
200
+ label="📱 Mobile Number",
201
+ placeholder="Enter your mobile number (e.g., +1234567890)",
202
+ lines=1,
203
+ )
204
+
205
+ login_password = gr.Textbox(
206
+ label="🔒 Password",
207
+ type="password",
208
+ placeholder="Enter your password",
209
+ lines=1,
210
+ )
211
+
212
+ login_btn = gr.Button("🔑 Login", variant="primary", size="lg")
213
+ login_result = gr.Textbox(label="Result", interactive=False)
214
+
215
+ # Signup Tab
216
+ with gr.TabItem("📝 Sign Up"):
217
+ gr.Markdown("### Create your new account")
218
+
219
+ signup_fullname = gr.Textbox(
220
+ label="👤 Full Name",
221
+ placeholder="Enter your full name",
222
+ lines=1,
223
+ )
224
+
225
+ signup_mobile = gr.Textbox(
226
+ label="📱 Mobile Number",
227
+ placeholder="Enter your mobile number (e.g., +1234567890)",
228
+ lines=1,
229
+ )
230
+
231
+ signup_password = gr.Textbox(
232
+ label="🔒 Password",
233
+ type="password",
234
+ placeholder="Create a strong password (min 8 chars, include uppercase, lowercase, digit)",
235
+ lines=1,
236
+ )
237
+
238
+ signup_confirm_password = gr.Textbox(
239
+ label="🔒 Confirm Password",
240
+ type="password",
241
+ placeholder="Confirm your password",
242
+ lines=1,
243
+ )
244
+
245
+ signup_btn = gr.Button(
246
+ "📝 Create Account", variant="primary", size="lg"
247
+ )
248
+ signup_result = gr.Textbox(label="Result", interactive=False)
249
+
250
+ # Logged in section
251
+ with gr.Column(visible=False) as logged_in_section:
252
+ gr.Markdown("### ✅ You are logged in!")
253
+
254
+ user_info = gr.Markdown("Welcome!")
255
+
256
+ logout_btn = gr.Button("🚪 Logout", variant="secondary")
257
+ logout_result = gr.Textbox(label="Result", interactive=False)
258
+
259
+ # Password requirements info
260
+ with gr.Accordion("📋 Password Requirements", open=False):
261
+ gr.Markdown(
262
+ """
263
+ **Password must contain:**
264
+ - At least 8 characters
265
+ - At least 1 uppercase letter (A-Z)
266
+ - At least 1 lowercase letter (a-z)
267
+ - At least 1 digit (0-9)
268
+ - Maximum 128 characters
269
+
270
+ **Mobile Number Format:**
271
+ - 10-15 digits
272
+ - Can include country code
273
+ - Examples: +1234567890, 1234567890, +91987654321
274
+ """
275
+ )
276
+
277
+ # Event handlers
278
+ def sync_signup(*args):
279
+ """Synchronous wrapper for signup"""
280
+ return asyncio.run(self.handle_signup(*args))
281
+
282
+ def sync_login(*args):
283
+ """Synchronous wrapper for login"""
284
+ return asyncio.run(self.handle_login(*args))
285
+
286
+ def update_ui_after_auth(result_text, success, auth_forms_update):
287
+ """Update UI after authentication"""
288
+ if success:
289
+ return (
290
+ result_text, # session_status
291
+ auth_forms_update, # auth_forms visibility
292
+ gr.update(visible=True), # logged_in_section visibility
293
+ f"### 👋 {self.current_session['full_name'] if self.current_session else 'User'}", # user_info
294
+ )
295
+ else:
296
+ return (
297
+ "Not logged in", # session_status
298
+ auth_forms_update, # auth_forms visibility
299
+ gr.update(visible=False), # logged_in_section visibility
300
+ "Welcome!", # user_info
301
+ )
302
+
303
+ def update_ui_after_logout(result_text, success, auth_forms_update):
304
+ """Update UI after logout"""
305
+ return (
306
+ "Not logged in", # session_status
307
+ auth_forms_update, # auth_forms visibility
308
+ gr.update(visible=False), # logged_in_section visibility
309
+ "Welcome!", # user_info
310
+ )
311
+
312
+ # Login button click
313
+ login_btn.click(
314
+ fn=sync_login,
315
+ inputs=[login_mobile, login_password],
316
+ outputs=[login_result, gr.State(), gr.State()],
317
+ ).then(
318
+ fn=update_ui_after_auth,
319
+ inputs=[login_result, gr.State(), gr.State()],
320
+ outputs=[session_status, auth_forms, logged_in_section, user_info],
321
+ )
322
+
323
+ # Signup button click
324
+ signup_btn.click(
325
+ fn=sync_signup,
326
+ inputs=[
327
+ signup_fullname,
328
+ signup_mobile,
329
+ signup_password,
330
+ signup_confirm_password,
331
+ ],
332
+ outputs=[signup_result, gr.State(), gr.State()],
333
+ ).then(
334
+ fn=update_ui_after_auth,
335
+ inputs=[signup_result, gr.State(), gr.State()],
336
+ outputs=[session_status, auth_forms, logged_in_section, user_info],
337
+ )
338
+
339
+ # Logout button click
340
+ logout_btn.click(
341
+ fn=self.handle_logout, outputs=[logout_result, gr.State(), gr.State()]
342
+ ).then(
343
+ fn=update_ui_after_logout,
344
+ inputs=[logout_result, gr.State(), gr.State()],
345
+ outputs=[session_status, auth_forms, logged_in_section, user_info],
346
+ )
347
+
348
+ return auth_interface
349
+
350
+
351
+ # Standalone authentication app
352
+ def create_auth_app(db_path: str = "openmanus.db") -> gr.Interface:
353
+ """Create standalone authentication app"""
354
+ auth_interface = AuthInterface(db_path)
355
+ return auth_interface.create_interface()
356
+
357
+
358
+ if __name__ == "__main__":
359
+ # Run standalone auth interface for testing
360
+ auth_app = create_auth_app()
361
+ auth_app.launch(server_name="0.0.0.0", server_port=7860, share=False, debug=True)
app/auth_service.py ADDED
@@ -0,0 +1,357 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ User authentication service for OpenManus
3
+ Handles user registration, login, and session management with D1 database
4
+ """
5
+
6
+ import json
7
+ import sqlite3
8
+ from datetime import datetime
9
+ from typing import Optional, Tuple
10
+
11
+ from app.auth import (
12
+ User,
13
+ UserAuth,
14
+ UserSession,
15
+ UserSignupRequest,
16
+ UserLoginRequest,
17
+ AuthResponse,
18
+ UserProfile,
19
+ )
20
+ from app.logger import logger
21
+
22
+
23
+ class AuthService:
24
+ """Authentication service for user management"""
25
+
26
+ def __init__(self, db_connection=None):
27
+ """Initialize auth service with database connection"""
28
+ self.db = db_connection
29
+ self.logger = logger
30
+
31
+ async def register_user(self, signup_data: UserSignupRequest) -> AuthResponse:
32
+ """Register a new user"""
33
+ try:
34
+ # Format mobile number consistently
35
+ formatted_mobile = UserAuth.format_mobile_number(signup_data.mobile_number)
36
+
37
+ # Check if user already exists
38
+ existing_user = await self.get_user_by_mobile(formatted_mobile)
39
+ if existing_user:
40
+ return AuthResponse(
41
+ success=False, message="User with this mobile number already exists"
42
+ )
43
+
44
+ # Create new user
45
+ user_id = UserAuth.generate_user_id()
46
+ password_hash = UserAuth.hash_password(signup_data.password)
47
+
48
+ user = User(
49
+ id=user_id,
50
+ mobile_number=formatted_mobile,
51
+ full_name=signup_data.full_name,
52
+ password_hash=password_hash,
53
+ created_at=datetime.utcnow(),
54
+ updated_at=datetime.utcnow(),
55
+ )
56
+
57
+ # Save user to database
58
+ success = await self.save_user(user)
59
+ if not success:
60
+ return AuthResponse(
61
+ success=False, message="Failed to create user account"
62
+ )
63
+
64
+ # Create session
65
+ session = UserAuth.create_session(user)
66
+ session_saved = await self.save_session(session)
67
+
68
+ if not session_saved:
69
+ return AuthResponse(
70
+ success=False, message="User created but failed to create session"
71
+ )
72
+
73
+ self.logger.info(f"New user registered: {formatted_mobile}")
74
+
75
+ return AuthResponse(
76
+ success=True,
77
+ message="Account created successfully",
78
+ session_id=session.session_id,
79
+ user_id=user.id,
80
+ full_name=user.full_name,
81
+ )
82
+
83
+ except Exception as e:
84
+ self.logger.error(f"User registration error: {str(e)}")
85
+ return AuthResponse(
86
+ success=False, message="An error occurred during registration"
87
+ )
88
+
89
+ async def login_user(self, login_data: UserLoginRequest) -> AuthResponse:
90
+ """Authenticate user login"""
91
+ try:
92
+ # Format mobile number consistently
93
+ formatted_mobile = UserAuth.format_mobile_number(login_data.mobile_number)
94
+
95
+ # Get user from database
96
+ user = await self.get_user_by_mobile(formatted_mobile)
97
+ if not user:
98
+ return AuthResponse(
99
+ success=False, message="Invalid mobile number or password"
100
+ )
101
+
102
+ # Verify password
103
+ if not UserAuth.verify_password(login_data.password, user.password_hash):
104
+ return AuthResponse(
105
+ success=False, message="Invalid mobile number or password"
106
+ )
107
+
108
+ # Check if user is active
109
+ if not user.is_active:
110
+ return AuthResponse(
111
+ success=False,
112
+ message="Account is deactivated. Please contact support.",
113
+ )
114
+
115
+ # Create new session
116
+ session = UserAuth.create_session(user)
117
+ session_saved = await self.save_session(session)
118
+
119
+ if not session_saved:
120
+ return AuthResponse(
121
+ success=False,
122
+ message="Login successful but failed to create session",
123
+ )
124
+
125
+ self.logger.info(f"User logged in: {formatted_mobile}")
126
+
127
+ return AuthResponse(
128
+ success=True,
129
+ message="Login successful",
130
+ session_id=session.session_id,
131
+ user_id=user.id,
132
+ full_name=user.full_name,
133
+ )
134
+
135
+ except Exception as e:
136
+ self.logger.error(f"User login error: {str(e)}")
137
+ return AuthResponse(success=False, message="An error occurred during login")
138
+
139
+ async def validate_session(self, session_id: str) -> Optional[UserSession]:
140
+ """Validate user session"""
141
+ try:
142
+ if not self.db:
143
+ return None
144
+
145
+ cursor = self.db.cursor()
146
+ cursor.execute(
147
+ """
148
+ SELECT s.id, s.user_id, u.mobile_number, u.full_name,
149
+ s.created_at, s.expires_at
150
+ FROM sessions s
151
+ JOIN users u ON s.user_id = u.id
152
+ WHERE s.id = ? AND u.is_active = 1
153
+ """,
154
+ (session_id,),
155
+ )
156
+
157
+ row = cursor.fetchone()
158
+ if not row:
159
+ return None
160
+
161
+ session = UserSession(
162
+ session_id=row[0],
163
+ user_id=row[1],
164
+ mobile_number=row[2],
165
+ full_name=row[3],
166
+ created_at=datetime.fromisoformat(row[4]),
167
+ expires_at=datetime.fromisoformat(row[5]),
168
+ )
169
+
170
+ # Check if session is still valid
171
+ if not session.is_valid:
172
+ # Clean up expired session
173
+ await self.delete_session(session_id)
174
+ return None
175
+
176
+ return session
177
+
178
+ except Exception as e:
179
+ self.logger.error(f"Session validation error: {str(e)}")
180
+ return None
181
+
182
+ async def logout_user(self, session_id: str) -> bool:
183
+ """Logout user by deleting session"""
184
+ return await self.delete_session(session_id)
185
+
186
+ async def get_user_profile(self, user_id: str) -> Optional[UserProfile]:
187
+ """Get user profile by user ID"""
188
+ try:
189
+ user = await self.get_user_by_id(user_id)
190
+ if not user:
191
+ return None
192
+
193
+ return UserProfile(
194
+ user_id=user.id,
195
+ full_name=user.full_name,
196
+ mobile_number=UserProfile.mask_mobile_number(user.mobile_number),
197
+ avatar_url=user.avatar_url,
198
+ created_at=user.created_at.isoformat() if user.created_at else None,
199
+ )
200
+
201
+ except Exception as e:
202
+ self.logger.error(f"Get user profile error: {str(e)}")
203
+ return None
204
+
205
+ # Database operations
206
+ async def save_user(self, user: User) -> bool:
207
+ """Save user to database"""
208
+ try:
209
+ if not self.db:
210
+ return False
211
+
212
+ cursor = self.db.cursor()
213
+ cursor.execute(
214
+ """
215
+ INSERT INTO users (id, mobile_number, full_name, password_hash,
216
+ avatar_url, preferences, is_active, created_at, updated_at)
217
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
218
+ """,
219
+ (
220
+ user.id,
221
+ user.mobile_number,
222
+ user.full_name,
223
+ user.password_hash,
224
+ user.avatar_url,
225
+ user.preferences,
226
+ user.is_active,
227
+ user.created_at.isoformat() if user.created_at else None,
228
+ user.updated_at.isoformat() if user.updated_at else None,
229
+ ),
230
+ )
231
+
232
+ self.db.commit()
233
+ return True
234
+
235
+ except Exception as e:
236
+ self.logger.error(f"Save user error: {str(e)}")
237
+ return False
238
+
239
+ async def get_user_by_mobile(self, mobile_number: str) -> Optional[User]:
240
+ """Get user by mobile number"""
241
+ try:
242
+ if not self.db:
243
+ return None
244
+
245
+ cursor = self.db.cursor()
246
+ cursor.execute(
247
+ """
248
+ SELECT id, mobile_number, full_name, password_hash, avatar_url,
249
+ preferences, is_active, created_at, updated_at
250
+ FROM users
251
+ WHERE mobile_number = ?
252
+ """,
253
+ (mobile_number,),
254
+ )
255
+
256
+ row = cursor.fetchone()
257
+ if not row:
258
+ return None
259
+
260
+ return User(
261
+ id=row[0],
262
+ mobile_number=row[1],
263
+ full_name=row[2],
264
+ password_hash=row[3],
265
+ avatar_url=row[4],
266
+ preferences=row[5],
267
+ is_active=bool(row[6]),
268
+ created_at=datetime.fromisoformat(row[7]) if row[7] else None,
269
+ updated_at=datetime.fromisoformat(row[8]) if row[8] else None,
270
+ )
271
+
272
+ except Exception as e:
273
+ self.logger.error(f"Get user by mobile error: {str(e)}")
274
+ return None
275
+
276
+ async def get_user_by_id(self, user_id: str) -> Optional[User]:
277
+ """Get user by ID"""
278
+ try:
279
+ if not self.db:
280
+ return None
281
+
282
+ cursor = self.db.cursor()
283
+ cursor.execute(
284
+ """
285
+ SELECT id, mobile_number, full_name, password_hash, avatar_url,
286
+ preferences, is_active, created_at, updated_at
287
+ FROM users
288
+ WHERE id = ? AND is_active = 1
289
+ """,
290
+ (user_id,),
291
+ )
292
+
293
+ row = cursor.fetchone()
294
+ if not row:
295
+ return None
296
+
297
+ return User(
298
+ id=row[0],
299
+ mobile_number=row[1],
300
+ full_name=row[2],
301
+ password_hash=row[3],
302
+ avatar_url=row[4],
303
+ preferences=row[5],
304
+ is_active=bool(row[6]),
305
+ created_at=datetime.fromisoformat(row[7]) if row[7] else None,
306
+ updated_at=datetime.fromisoformat(row[8]) if row[8] else None,
307
+ )
308
+
309
+ except Exception as e:
310
+ self.logger.error(f"Get user by ID error: {str(e)}")
311
+ return None
312
+
313
+ async def save_session(self, session: UserSession) -> bool:
314
+ """Save session to database"""
315
+ try:
316
+ if not self.db:
317
+ return False
318
+
319
+ cursor = self.db.cursor()
320
+ cursor.execute(
321
+ """
322
+ INSERT INTO sessions (id, user_id, title, metadata, created_at,
323
+ updated_at, expires_at)
324
+ VALUES (?, ?, ?, ?, ?, ?, ?)
325
+ """,
326
+ (
327
+ session.session_id,
328
+ session.user_id,
329
+ "User Session",
330
+ json.dumps({"login_type": "mobile_password"}),
331
+ session.created_at.isoformat(),
332
+ session.created_at.isoformat(),
333
+ session.expires_at.isoformat(),
334
+ ),
335
+ )
336
+
337
+ self.db.commit()
338
+ return True
339
+
340
+ except Exception as e:
341
+ self.logger.error(f"Save session error: {str(e)}")
342
+ return False
343
+
344
+ async def delete_session(self, session_id: str) -> bool:
345
+ """Delete session from database"""
346
+ try:
347
+ if not self.db:
348
+ return False
349
+
350
+ cursor = self.db.cursor()
351
+ cursor.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
352
+ self.db.commit()
353
+ return True
354
+
355
+ except Exception as e:
356
+ self.logger.error(f"Delete session error: {str(e)}")
357
+ return False
app/bedrock.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import sys
3
+ import time
4
+ import uuid
5
+ from datetime import datetime
6
+ from typing import Dict, List, Literal, Optional
7
+
8
+ import boto3
9
+
10
+
11
+ # Global variables to track the current tool use ID across function calls
12
+ # Tmp solution
13
+ CURRENT_TOOLUSE_ID = None
14
+
15
+
16
+ # Class to handle OpenAI-style response formatting
17
+ class OpenAIResponse:
18
+ def __init__(self, data):
19
+ # Recursively convert nested dicts and lists to OpenAIResponse objects
20
+ for key, value in data.items():
21
+ if isinstance(value, dict):
22
+ value = OpenAIResponse(value)
23
+ elif isinstance(value, list):
24
+ value = [
25
+ OpenAIResponse(item) if isinstance(item, dict) else item
26
+ for item in value
27
+ ]
28
+ setattr(self, key, value)
29
+
30
+ def model_dump(self, *args, **kwargs):
31
+ # Convert object to dict and add timestamp
32
+ data = self.__dict__
33
+ data["created_at"] = datetime.now().isoformat()
34
+ return data
35
+
36
+
37
+ # Main client class for interacting with Amazon Bedrock
38
+ class BedrockClient:
39
+ def __init__(self):
40
+ # Initialize Bedrock client, you need to configure AWS env first
41
+ try:
42
+ self.client = boto3.client("bedrock-runtime")
43
+ self.chat = Chat(self.client)
44
+ except Exception as e:
45
+ print(f"Error initializing Bedrock client: {e}")
46
+ sys.exit(1)
47
+
48
+
49
+ # Chat interface class
50
+ class Chat:
51
+ def __init__(self, client):
52
+ self.completions = ChatCompletions(client)
53
+
54
+
55
+ # Core class handling chat completions functionality
56
+ class ChatCompletions:
57
+ def __init__(self, client):
58
+ self.client = client
59
+
60
+ def _convert_openai_tools_to_bedrock_format(self, tools):
61
+ # Convert OpenAI function calling format to Bedrock tool format
62
+ bedrock_tools = []
63
+ for tool in tools:
64
+ if tool.get("type") == "function":
65
+ function = tool.get("function", {})
66
+ bedrock_tool = {
67
+ "toolSpec": {
68
+ "name": function.get("name", ""),
69
+ "description": function.get("description", ""),
70
+ "inputSchema": {
71
+ "json": {
72
+ "type": "object",
73
+ "properties": function.get("parameters", {}).get(
74
+ "properties", {}
75
+ ),
76
+ "required": function.get("parameters", {}).get(
77
+ "required", []
78
+ ),
79
+ }
80
+ },
81
+ }
82
+ }
83
+ bedrock_tools.append(bedrock_tool)
84
+ return bedrock_tools
85
+
86
+ def _convert_openai_messages_to_bedrock_format(self, messages):
87
+ # Convert OpenAI message format to Bedrock message format
88
+ bedrock_messages = []
89
+ system_prompt = []
90
+ for message in messages:
91
+ if message.get("role") == "system":
92
+ system_prompt = [{"text": message.get("content")}]
93
+ elif message.get("role") == "user":
94
+ bedrock_message = {
95
+ "role": message.get("role", "user"),
96
+ "content": [{"text": message.get("content")}],
97
+ }
98
+ bedrock_messages.append(bedrock_message)
99
+ elif message.get("role") == "assistant":
100
+ bedrock_message = {
101
+ "role": "assistant",
102
+ "content": [{"text": message.get("content")}],
103
+ }
104
+ openai_tool_calls = message.get("tool_calls", [])
105
+ if openai_tool_calls:
106
+ bedrock_tool_use = {
107
+ "toolUseId": openai_tool_calls[0]["id"],
108
+ "name": openai_tool_calls[0]["function"]["name"],
109
+ "input": json.loads(
110
+ openai_tool_calls[0]["function"]["arguments"]
111
+ ),
112
+ }
113
+ bedrock_message["content"].append({"toolUse": bedrock_tool_use})
114
+ global CURRENT_TOOLUSE_ID
115
+ CURRENT_TOOLUSE_ID = openai_tool_calls[0]["id"]
116
+ bedrock_messages.append(bedrock_message)
117
+ elif message.get("role") == "tool":
118
+ bedrock_message = {
119
+ "role": "user",
120
+ "content": [
121
+ {
122
+ "toolResult": {
123
+ "toolUseId": CURRENT_TOOLUSE_ID,
124
+ "content": [{"text": message.get("content")}],
125
+ }
126
+ }
127
+ ],
128
+ }
129
+ bedrock_messages.append(bedrock_message)
130
+ else:
131
+ raise ValueError(f"Invalid role: {message.get('role')}")
132
+ return system_prompt, bedrock_messages
133
+
134
+ def _convert_bedrock_response_to_openai_format(self, bedrock_response):
135
+ # Convert Bedrock response format to OpenAI format
136
+ content = ""
137
+ if bedrock_response.get("output", {}).get("message", {}).get("content"):
138
+ content_array = bedrock_response["output"]["message"]["content"]
139
+ content = "".join(item.get("text", "") for item in content_array)
140
+ if content == "":
141
+ content = "."
142
+
143
+ # Handle tool calls in response
144
+ openai_tool_calls = []
145
+ if bedrock_response.get("output", {}).get("message", {}).get("content"):
146
+ for content_item in bedrock_response["output"]["message"]["content"]:
147
+ if content_item.get("toolUse"):
148
+ bedrock_tool_use = content_item["toolUse"]
149
+ global CURRENT_TOOLUSE_ID
150
+ CURRENT_TOOLUSE_ID = bedrock_tool_use["toolUseId"]
151
+ openai_tool_call = {
152
+ "id": CURRENT_TOOLUSE_ID,
153
+ "type": "function",
154
+ "function": {
155
+ "name": bedrock_tool_use["name"],
156
+ "arguments": json.dumps(bedrock_tool_use["input"]),
157
+ },
158
+ }
159
+ openai_tool_calls.append(openai_tool_call)
160
+
161
+ # Construct final OpenAI format response
162
+ openai_format = {
163
+ "id": f"chatcmpl-{uuid.uuid4()}",
164
+ "created": int(time.time()),
165
+ "object": "chat.completion",
166
+ "system_fingerprint": None,
167
+ "choices": [
168
+ {
169
+ "finish_reason": bedrock_response.get("stopReason", "end_turn"),
170
+ "index": 0,
171
+ "message": {
172
+ "content": content,
173
+ "role": bedrock_response.get("output", {})
174
+ .get("message", {})
175
+ .get("role", "assistant"),
176
+ "tool_calls": openai_tool_calls
177
+ if openai_tool_calls != []
178
+ else None,
179
+ "function_call": None,
180
+ },
181
+ }
182
+ ],
183
+ "usage": {
184
+ "completion_tokens": bedrock_response.get("usage", {}).get(
185
+ "outputTokens", 0
186
+ ),
187
+ "prompt_tokens": bedrock_response.get("usage", {}).get(
188
+ "inputTokens", 0
189
+ ),
190
+ "total_tokens": bedrock_response.get("usage", {}).get("totalTokens", 0),
191
+ },
192
+ }
193
+ return OpenAIResponse(openai_format)
194
+
195
+ async def _invoke_bedrock(
196
+ self,
197
+ model: str,
198
+ messages: List[Dict[str, str]],
199
+ max_tokens: int,
200
+ temperature: float,
201
+ tools: Optional[List[dict]] = None,
202
+ tool_choice: Literal["none", "auto", "required"] = "auto",
203
+ **kwargs,
204
+ ) -> OpenAIResponse:
205
+ # Non-streaming invocation of Bedrock model
206
+ (
207
+ system_prompt,
208
+ bedrock_messages,
209
+ ) = self._convert_openai_messages_to_bedrock_format(messages)
210
+ response = self.client.converse(
211
+ modelId=model,
212
+ system=system_prompt,
213
+ messages=bedrock_messages,
214
+ inferenceConfig={"temperature": temperature, "maxTokens": max_tokens},
215
+ toolConfig={"tools": tools} if tools else None,
216
+ )
217
+ openai_response = self._convert_bedrock_response_to_openai_format(response)
218
+ return openai_response
219
+
220
+ async def _invoke_bedrock_stream(
221
+ self,
222
+ model: str,
223
+ messages: List[Dict[str, str]],
224
+ max_tokens: int,
225
+ temperature: float,
226
+ tools: Optional[List[dict]] = None,
227
+ tool_choice: Literal["none", "auto", "required"] = "auto",
228
+ **kwargs,
229
+ ) -> OpenAIResponse:
230
+ # Streaming invocation of Bedrock model
231
+ (
232
+ system_prompt,
233
+ bedrock_messages,
234
+ ) = self._convert_openai_messages_to_bedrock_format(messages)
235
+ response = self.client.converse_stream(
236
+ modelId=model,
237
+ system=system_prompt,
238
+ messages=bedrock_messages,
239
+ inferenceConfig={"temperature": temperature, "maxTokens": max_tokens},
240
+ toolConfig={"tools": tools} if tools else None,
241
+ )
242
+
243
+ # Initialize response structure
244
+ bedrock_response = {
245
+ "output": {"message": {"role": "", "content": []}},
246
+ "stopReason": "",
247
+ "usage": {},
248
+ "metrics": {},
249
+ }
250
+ bedrock_response_text = ""
251
+ bedrock_response_tool_input = ""
252
+
253
+ # Process streaming response
254
+ stream = response.get("stream")
255
+ if stream:
256
+ for event in stream:
257
+ if event.get("messageStart", {}).get("role"):
258
+ bedrock_response["output"]["message"]["role"] = event[
259
+ "messageStart"
260
+ ]["role"]
261
+ if event.get("contentBlockDelta", {}).get("delta", {}).get("text"):
262
+ bedrock_response_text += event["contentBlockDelta"]["delta"]["text"]
263
+ print(
264
+ event["contentBlockDelta"]["delta"]["text"], end="", flush=True
265
+ )
266
+ if event.get("contentBlockStop", {}).get("contentBlockIndex") == 0:
267
+ bedrock_response["output"]["message"]["content"].append(
268
+ {"text": bedrock_response_text}
269
+ )
270
+ if event.get("contentBlockStart", {}).get("start", {}).get("toolUse"):
271
+ bedrock_tool_use = event["contentBlockStart"]["start"]["toolUse"]
272
+ tool_use = {
273
+ "toolUseId": bedrock_tool_use["toolUseId"],
274
+ "name": bedrock_tool_use["name"],
275
+ }
276
+ bedrock_response["output"]["message"]["content"].append(
277
+ {"toolUse": tool_use}
278
+ )
279
+ global CURRENT_TOOLUSE_ID
280
+ CURRENT_TOOLUSE_ID = bedrock_tool_use["toolUseId"]
281
+ if event.get("contentBlockDelta", {}).get("delta", {}).get("toolUse"):
282
+ bedrock_response_tool_input += event["contentBlockDelta"]["delta"][
283
+ "toolUse"
284
+ ]["input"]
285
+ print(
286
+ event["contentBlockDelta"]["delta"]["toolUse"]["input"],
287
+ end="",
288
+ flush=True,
289
+ )
290
+ if event.get("contentBlockStop", {}).get("contentBlockIndex") == 1:
291
+ bedrock_response["output"]["message"]["content"][1]["toolUse"][
292
+ "input"
293
+ ] = json.loads(bedrock_response_tool_input)
294
+ print()
295
+ openai_response = self._convert_bedrock_response_to_openai_format(
296
+ bedrock_response
297
+ )
298
+ return openai_response
299
+
300
+ def create(
301
+ self,
302
+ model: str,
303
+ messages: List[Dict[str, str]],
304
+ max_tokens: int,
305
+ temperature: float,
306
+ stream: Optional[bool] = True,
307
+ tools: Optional[List[dict]] = None,
308
+ tool_choice: Literal["none", "auto", "required"] = "auto",
309
+ **kwargs,
310
+ ) -> OpenAIResponse:
311
+ # Main entry point for chat completion
312
+ bedrock_tools = []
313
+ if tools is not None:
314
+ bedrock_tools = self._convert_openai_tools_to_bedrock_format(tools)
315
+ if stream:
316
+ return self._invoke_bedrock_stream(
317
+ model,
318
+ messages,
319
+ max_tokens,
320
+ temperature,
321
+ bedrock_tools,
322
+ tool_choice,
323
+ **kwargs,
324
+ )
325
+ else:
326
+ return self._invoke_bedrock(
327
+ model,
328
+ messages,
329
+ max_tokens,
330
+ temperature,
331
+ bedrock_tools,
332
+ tool_choice,
333
+ **kwargs,
334
+ )
app/cloudflare/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Cloudflare services integration for OpenManus
3
+ """
4
+
5
+ from .client import CloudflareClient
6
+ from .d1 import D1Database
7
+ from .durable_objects import DurableObjects
8
+ from .kv import KVStorage
9
+ from .r2 import R2Storage
10
+
11
+ __all__ = ["CloudflareClient", "D1Database", "R2Storage", "KVStorage", "DurableObjects"]
app/cloudflare/client.py ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Cloudflare API Client
3
+ Handles authentication and base HTTP operations for Cloudflare services
4
+ """
5
+
6
+ import asyncio
7
+ import json
8
+ from typing import Any, Dict, Optional, Union
9
+
10
+ import aiohttp
11
+
12
+ from app.logger import logger
13
+
14
+
15
+ class CloudflareClient:
16
+ """Base client for Cloudflare API operations"""
17
+
18
+ def __init__(
19
+ self,
20
+ api_token: str,
21
+ account_id: str,
22
+ worker_url: Optional[str] = None,
23
+ timeout: int = 30,
24
+ ):
25
+ self.api_token = api_token
26
+ self.account_id = account_id
27
+ self.worker_url = worker_url
28
+ self.timeout = timeout
29
+ self.base_url = "https://api.cloudflare.com/client/v4"
30
+
31
+ # HTTP headers for API requests
32
+ self.headers = {
33
+ "Authorization": f"Bearer {api_token}",
34
+ "Content-Type": "application/json",
35
+ }
36
+
37
+ async def _make_request(
38
+ self,
39
+ method: str,
40
+ url: str,
41
+ data: Optional[Dict[str, Any]] = None,
42
+ headers: Optional[Dict[str, str]] = None,
43
+ use_worker: bool = False,
44
+ ) -> Dict[str, Any]:
45
+ """Make HTTP request to Cloudflare API or Worker"""
46
+
47
+ # Use worker URL if specified and use_worker is True
48
+ if use_worker and self.worker_url:
49
+ full_url = f"{self.worker_url.rstrip('/')}/{url.lstrip('/')}"
50
+ else:
51
+ full_url = f"{self.base_url}/{url.lstrip('/')}"
52
+
53
+ request_headers = self.headers.copy()
54
+ if headers:
55
+ request_headers.update(headers)
56
+
57
+ timeout = aiohttp.ClientTimeout(total=self.timeout)
58
+
59
+ try:
60
+ async with aiohttp.ClientSession(timeout=timeout) as session:
61
+ async with session.request(
62
+ method=method.upper(),
63
+ url=full_url,
64
+ headers=request_headers,
65
+ json=data if data else None,
66
+ ) as response:
67
+ response_text = await response.text()
68
+
69
+ try:
70
+ response_data = (
71
+ json.loads(response_text) if response_text else {}
72
+ )
73
+ except json.JSONDecodeError:
74
+ response_data = {"raw_response": response_text}
75
+
76
+ if not response.ok:
77
+ logger.error(
78
+ f"Cloudflare API error: {response.status} - {response_text}"
79
+ )
80
+ raise CloudflareError(
81
+ f"HTTP {response.status}: {response_text}",
82
+ response.status,
83
+ response_data,
84
+ )
85
+
86
+ return response_data
87
+
88
+ except asyncio.TimeoutError:
89
+ logger.error(f"Timeout making request to {full_url}")
90
+ raise CloudflareError(f"Request timeout after {self.timeout}s")
91
+ except aiohttp.ClientError as e:
92
+ logger.error(f"HTTP client error: {e}")
93
+ raise CloudflareError(f"Client error: {e}")
94
+
95
+ async def get(
96
+ self,
97
+ url: str,
98
+ headers: Optional[Dict[str, str]] = None,
99
+ use_worker: bool = False,
100
+ ) -> Dict[str, Any]:
101
+ """Make GET request"""
102
+ return await self._make_request(
103
+ "GET", url, headers=headers, use_worker=use_worker
104
+ )
105
+
106
+ async def post(
107
+ self,
108
+ url: str,
109
+ data: Optional[Dict[str, Any]] = None,
110
+ headers: Optional[Dict[str, str]] = None,
111
+ use_worker: bool = False,
112
+ ) -> Dict[str, Any]:
113
+ """Make POST request"""
114
+ return await self._make_request(
115
+ "POST", url, data=data, headers=headers, use_worker=use_worker
116
+ )
117
+
118
+ async def put(
119
+ self,
120
+ url: str,
121
+ data: Optional[Dict[str, Any]] = None,
122
+ headers: Optional[Dict[str, str]] = None,
123
+ use_worker: bool = False,
124
+ ) -> Dict[str, Any]:
125
+ """Make PUT request"""
126
+ return await self._make_request(
127
+ "PUT", url, data=data, headers=headers, use_worker=use_worker
128
+ )
129
+
130
+ async def delete(
131
+ self,
132
+ url: str,
133
+ headers: Optional[Dict[str, str]] = None,
134
+ use_worker: bool = False,
135
+ ) -> Dict[str, Any]:
136
+ """Make DELETE request"""
137
+ return await self._make_request(
138
+ "DELETE", url, headers=headers, use_worker=use_worker
139
+ )
140
+
141
+ async def upload_file(
142
+ self,
143
+ url: str,
144
+ file_data: bytes,
145
+ content_type: str = "application/octet-stream",
146
+ headers: Optional[Dict[str, str]] = None,
147
+ use_worker: bool = False,
148
+ ) -> Dict[str, Any]:
149
+ """Upload file data"""
150
+
151
+ # Use worker URL if specified and use_worker is True
152
+ if use_worker and self.worker_url:
153
+ full_url = f"{self.worker_url.rstrip('/')}/{url.lstrip('/')}"
154
+ else:
155
+ full_url = f"{self.base_url}/{url.lstrip('/')}"
156
+
157
+ upload_headers = {
158
+ "Authorization": f"Bearer {self.api_token}",
159
+ "Content-Type": content_type,
160
+ }
161
+ if headers:
162
+ upload_headers.update(headers)
163
+
164
+ timeout = aiohttp.ClientTimeout(
165
+ total=self.timeout * 2
166
+ ) # Longer timeout for uploads
167
+
168
+ try:
169
+ async with aiohttp.ClientSession(timeout=timeout) as session:
170
+ async with session.put(
171
+ url=full_url, headers=upload_headers, data=file_data
172
+ ) as response:
173
+ response_text = await response.text()
174
+
175
+ try:
176
+ response_data = (
177
+ json.loads(response_text) if response_text else {}
178
+ )
179
+ except json.JSONDecodeError:
180
+ response_data = {"raw_response": response_text}
181
+
182
+ if not response.ok:
183
+ logger.error(
184
+ f"File upload error: {response.status} - {response_text}"
185
+ )
186
+ raise CloudflareError(
187
+ f"Upload failed: HTTP {response.status}",
188
+ response.status,
189
+ response_data,
190
+ )
191
+
192
+ return response_data
193
+
194
+ except asyncio.TimeoutError:
195
+ logger.error(f"Timeout uploading file to {full_url}")
196
+ raise CloudflareError(f"Upload timeout after {self.timeout * 2}s")
197
+ except aiohttp.ClientError as e:
198
+ logger.error(f"Upload client error: {e}")
199
+ raise CloudflareError(f"Upload error: {e}")
200
+
201
+ def get_account_url(self, endpoint: str) -> str:
202
+ """Get URL for account-scoped endpoint"""
203
+ return f"accounts/{self.account_id}/{endpoint}"
204
+
205
+ def get_worker_url(self, endpoint: str) -> str:
206
+ """Get URL for worker endpoint"""
207
+ if not self.worker_url:
208
+ raise CloudflareError("Worker URL not configured")
209
+ return endpoint
210
+
211
+
212
+ class CloudflareError(Exception):
213
+ """Cloudflare API error"""
214
+
215
+ def __init__(
216
+ self,
217
+ message: str,
218
+ status_code: Optional[int] = None,
219
+ response_data: Optional[Dict[str, Any]] = None,
220
+ ):
221
+ super().__init__(message)
222
+ self.status_code = status_code
223
+ self.response_data = response_data or {}
224
+
225
+ def __str__(self) -> str:
226
+ if self.status_code:
227
+ return f"CloudflareError({self.status_code}): {super().__str__()}"
228
+ return f"CloudflareError: {super().__str__()}"
app/cloudflare/d1.py ADDED
@@ -0,0 +1,510 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ D1 Database integration for OpenManus
3
+ Provides interface to Cloudflare D1 database operations
4
+ """
5
+
6
+ from typing import Any, Dict, List, Optional, Union
7
+
8
+ from app.logger import logger
9
+
10
+ from .client import CloudflareClient, CloudflareError
11
+
12
+
13
+ class D1Database:
14
+ """Cloudflare D1 Database client"""
15
+
16
+ def __init__(self, client: CloudflareClient, database_id: str):
17
+ self.client = client
18
+ self.database_id = database_id
19
+ self.base_endpoint = f"accounts/{client.account_id}/d1/database/{database_id}"
20
+
21
+ async def execute_query(
22
+ self, sql: str, params: Optional[List[Any]] = None, use_worker: bool = True
23
+ ) -> Dict[str, Any]:
24
+ """Execute a SQL query"""
25
+
26
+ query_data = {"sql": sql}
27
+
28
+ if params:
29
+ query_data["params"] = params
30
+
31
+ try:
32
+ if use_worker:
33
+ # Use worker endpoint for better performance
34
+ response = await self.client.post(
35
+ "api/database/query", data=query_data, use_worker=True
36
+ )
37
+ else:
38
+ # Use Cloudflare API directly
39
+ response = await self.client.post(
40
+ f"{self.base_endpoint}/query", data=query_data
41
+ )
42
+
43
+ return response
44
+
45
+ except CloudflareError as e:
46
+ logger.error(f"D1 query execution failed: {e}")
47
+ raise
48
+
49
+ async def batch_execute(
50
+ self, queries: List[Dict[str, Any]], use_worker: bool = True
51
+ ) -> Dict[str, Any]:
52
+ """Execute multiple queries in a batch"""
53
+
54
+ batch_data = {"queries": queries}
55
+
56
+ try:
57
+ if use_worker:
58
+ response = await self.client.post(
59
+ "api/database/batch", data=batch_data, use_worker=True
60
+ )
61
+ else:
62
+ response = await self.client.post(
63
+ f"{self.base_endpoint}/query", data=batch_data
64
+ )
65
+
66
+ return response
67
+
68
+ except CloudflareError as e:
69
+ logger.error(f"D1 batch execution failed: {e}")
70
+ raise
71
+
72
+ # User management methods
73
+ async def create_user(
74
+ self,
75
+ user_id: str,
76
+ username: str,
77
+ email: Optional[str] = None,
78
+ metadata: Optional[Dict[str, Any]] = None,
79
+ ) -> Dict[str, Any]:
80
+ """Create a new user"""
81
+
82
+ sql = """
83
+ INSERT INTO users (id, username, email, metadata)
84
+ VALUES (?, ?, ?, ?)
85
+ ON CONFLICT(id) DO UPDATE SET
86
+ username = excluded.username,
87
+ email = excluded.email,
88
+ metadata = excluded.metadata,
89
+ updated_at = strftime('%s', 'now')
90
+ """
91
+
92
+ import json
93
+
94
+ params = [user_id, username, email, json.dumps(metadata or {})]
95
+
96
+ return await self.execute_query(sql, params)
97
+
98
+ async def get_user(self, user_id: str) -> Optional[Dict[str, Any]]:
99
+ """Get user by ID"""
100
+
101
+ sql = "SELECT * FROM users WHERE id = ?"
102
+ params = [user_id]
103
+
104
+ result = await self.execute_query(sql, params)
105
+
106
+ # Parse response based on Cloudflare D1 format
107
+ if result.get("success") and result.get("result"):
108
+ rows = result["result"][0].get("results", [])
109
+ if rows:
110
+ user = rows[0]
111
+ if user.get("metadata"):
112
+ import json
113
+
114
+ user["metadata"] = json.loads(user["metadata"])
115
+ return user
116
+
117
+ return None
118
+
119
+ async def get_user_by_username(self, username: str) -> Optional[Dict[str, Any]]:
120
+ """Get user by username"""
121
+
122
+ sql = "SELECT * FROM users WHERE username = ?"
123
+ params = [username]
124
+
125
+ result = await self.execute_query(sql, params)
126
+
127
+ if result.get("success") and result.get("result"):
128
+ rows = result["result"][0].get("results", [])
129
+ if rows:
130
+ user = rows[0]
131
+ if user.get("metadata"):
132
+ import json
133
+
134
+ user["metadata"] = json.loads(user["metadata"])
135
+ return user
136
+
137
+ return None
138
+
139
+ # Session management methods
140
+ async def create_session(
141
+ self,
142
+ session_id: str,
143
+ user_id: str,
144
+ session_data: Dict[str, Any],
145
+ expires_at: Optional[int] = None,
146
+ ) -> Dict[str, Any]:
147
+ """Create a new session"""
148
+
149
+ sql = """
150
+ INSERT INTO sessions (id, user_id, session_data, expires_at)
151
+ VALUES (?, ?, ?, ?)
152
+ """
153
+
154
+ import json
155
+
156
+ params = [session_id, user_id, json.dumps(session_data), expires_at]
157
+
158
+ return await self.execute_query(sql, params)
159
+
160
+ async def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
161
+ """Get session by ID"""
162
+
163
+ sql = """
164
+ SELECT * FROM sessions
165
+ WHERE id = ? AND (expires_at IS NULL OR expires_at > strftime('%s', 'now'))
166
+ """
167
+ params = [session_id]
168
+
169
+ result = await self.execute_query(sql, params)
170
+
171
+ if result.get("success") and result.get("result"):
172
+ rows = result["result"][0].get("results", [])
173
+ if rows:
174
+ session = rows[0]
175
+ if session.get("session_data"):
176
+ import json
177
+
178
+ session["session_data"] = json.loads(session["session_data"])
179
+ return session
180
+
181
+ return None
182
+
183
+ async def delete_session(self, session_id: str) -> Dict[str, Any]:
184
+ """Delete a session"""
185
+
186
+ sql = "DELETE FROM sessions WHERE id = ?"
187
+ params = [session_id]
188
+
189
+ return await self.execute_query(sql, params)
190
+
191
+ # Conversation methods
192
+ async def create_conversation(
193
+ self,
194
+ conversation_id: str,
195
+ user_id: str,
196
+ title: Optional[str] = None,
197
+ messages: Optional[List[Dict[str, Any]]] = None,
198
+ ) -> Dict[str, Any]:
199
+ """Create a new conversation"""
200
+
201
+ sql = """
202
+ INSERT INTO conversations (id, user_id, title, messages)
203
+ VALUES (?, ?, ?, ?)
204
+ """
205
+
206
+ import json
207
+
208
+ params = [conversation_id, user_id, title, json.dumps(messages or [])]
209
+
210
+ return await self.execute_query(sql, params)
211
+
212
+ async def get_conversation(self, conversation_id: str) -> Optional[Dict[str, Any]]:
213
+ """Get conversation by ID"""
214
+
215
+ sql = "SELECT * FROM conversations WHERE id = ?"
216
+ params = [conversation_id]
217
+
218
+ result = await self.execute_query(sql, params)
219
+
220
+ if result.get("success") and result.get("result"):
221
+ rows = result["result"][0].get("results", [])
222
+ if rows:
223
+ conversation = rows[0]
224
+ if conversation.get("messages"):
225
+ import json
226
+
227
+ conversation["messages"] = json.loads(conversation["messages"])
228
+ return conversation
229
+
230
+ return None
231
+
232
+ async def update_conversation_messages(
233
+ self, conversation_id: str, messages: List[Dict[str, Any]]
234
+ ) -> Dict[str, Any]:
235
+ """Update conversation messages"""
236
+
237
+ sql = """
238
+ UPDATE conversations
239
+ SET messages = ?, updated_at = strftime('%s', 'now')
240
+ WHERE id = ?
241
+ """
242
+
243
+ import json
244
+
245
+ params = [json.dumps(messages), conversation_id]
246
+
247
+ return await self.execute_query(sql, params)
248
+
249
+ async def get_user_conversations(
250
+ self, user_id: str, limit: int = 50
251
+ ) -> List[Dict[str, Any]]:
252
+ """Get user's conversations"""
253
+
254
+ sql = """
255
+ SELECT id, user_id, title, created_at, updated_at
256
+ FROM conversations
257
+ WHERE user_id = ?
258
+ ORDER BY updated_at DESC
259
+ LIMIT ?
260
+ """
261
+ params = [user_id, limit]
262
+
263
+ result = await self.execute_query(sql, params)
264
+
265
+ if result.get("success") and result.get("result"):
266
+ return result["result"][0].get("results", [])
267
+
268
+ return []
269
+
270
+ # Agent execution methods
271
+ async def create_agent_execution(
272
+ self,
273
+ execution_id: str,
274
+ user_id: str,
275
+ session_id: Optional[str] = None,
276
+ task_description: Optional[str] = None,
277
+ status: str = "pending",
278
+ ) -> Dict[str, Any]:
279
+ """Create a new agent execution record"""
280
+
281
+ sql = """
282
+ INSERT INTO agent_executions (id, user_id, session_id, task_description, status)
283
+ VALUES (?, ?, ?, ?, ?)
284
+ """
285
+
286
+ params = [execution_id, user_id, session_id, task_description, status]
287
+
288
+ return await self.execute_query(sql, params)
289
+
290
+ async def update_agent_execution(
291
+ self,
292
+ execution_id: str,
293
+ status: Optional[str] = None,
294
+ result: Optional[str] = None,
295
+ execution_time: Optional[int] = None,
296
+ ) -> Dict[str, Any]:
297
+ """Update agent execution record"""
298
+
299
+ updates = []
300
+ params = []
301
+
302
+ if status:
303
+ updates.append("status = ?")
304
+ params.append(status)
305
+
306
+ if result:
307
+ updates.append("result = ?")
308
+ params.append(result)
309
+
310
+ if execution_time is not None:
311
+ updates.append("execution_time = ?")
312
+ params.append(execution_time)
313
+
314
+ if status in ["completed", "failed"]:
315
+ updates.append("completed_at = strftime('%s', 'now')")
316
+
317
+ if not updates:
318
+ return {"success": True, "message": "No updates provided"}
319
+
320
+ sql = f"""
321
+ UPDATE agent_executions
322
+ SET {', '.join(updates)}
323
+ WHERE id = ?
324
+ """
325
+ params.append(execution_id)
326
+
327
+ return await self.execute_query(sql, params)
328
+
329
+ async def get_agent_execution(self, execution_id: str) -> Optional[Dict[str, Any]]:
330
+ """Get agent execution by ID"""
331
+
332
+ sql = "SELECT * FROM agent_executions WHERE id = ?"
333
+ params = [execution_id]
334
+
335
+ result = await self.execute_query(sql, params)
336
+
337
+ if result.get("success") and result.get("result"):
338
+ rows = result["result"][0].get("results", [])
339
+ if rows:
340
+ return rows[0]
341
+
342
+ return None
343
+
344
+ async def get_user_executions(
345
+ self, user_id: str, limit: int = 50
346
+ ) -> List[Dict[str, Any]]:
347
+ """Get user's agent executions"""
348
+
349
+ sql = """
350
+ SELECT * FROM agent_executions
351
+ WHERE user_id = ?
352
+ ORDER BY created_at DESC
353
+ LIMIT ?
354
+ """
355
+ params = [user_id, limit]
356
+
357
+ result = await self.execute_query(sql, params)
358
+
359
+ if result.get("success") and result.get("result"):
360
+ return result["result"][0].get("results", [])
361
+
362
+ return []
363
+
364
+ # File record methods
365
+ async def create_file_record(
366
+ self,
367
+ file_id: str,
368
+ user_id: str,
369
+ filename: str,
370
+ file_key: str,
371
+ file_size: int,
372
+ content_type: str,
373
+ bucket: str = "storage",
374
+ ) -> Dict[str, Any]:
375
+ """Create a file record"""
376
+
377
+ sql = """
378
+ INSERT INTO files (id, user_id, filename, file_key, file_size, content_type, bucket)
379
+ VALUES (?, ?, ?, ?, ?, ?, ?)
380
+ """
381
+
382
+ params = [file_id, user_id, filename, file_key, file_size, content_type, bucket]
383
+
384
+ return await self.execute_query(sql, params)
385
+
386
+ async def get_file_record(self, file_id: str) -> Optional[Dict[str, Any]]:
387
+ """Get file record by ID"""
388
+
389
+ sql = "SELECT * FROM files WHERE id = ?"
390
+ params = [file_id]
391
+
392
+ result = await self.execute_query(sql, params)
393
+
394
+ if result.get("success") and result.get("result"):
395
+ rows = result["result"][0].get("results", [])
396
+ if rows:
397
+ return rows[0]
398
+
399
+ return None
400
+
401
+ async def get_user_files(
402
+ self, user_id: str, limit: int = 100
403
+ ) -> List[Dict[str, Any]]:
404
+ """Get user's files"""
405
+
406
+ sql = """
407
+ SELECT * FROM files
408
+ WHERE user_id = ?
409
+ ORDER BY created_at DESC
410
+ LIMIT ?
411
+ """
412
+ params = [user_id, limit]
413
+
414
+ result = await self.execute_query(sql, params)
415
+
416
+ if result.get("success") and result.get("result"):
417
+ return result["result"][0].get("results", [])
418
+
419
+ return []
420
+
421
+ async def delete_file_record(self, file_id: str) -> Dict[str, Any]:
422
+ """Delete a file record"""
423
+
424
+ sql = "DELETE FROM files WHERE id = ?"
425
+ params = [file_id]
426
+
427
+ return await self.execute_query(sql, params)
428
+
429
+ # Schema initialization
430
+ async def initialize_schema(self) -> Dict[str, Any]:
431
+ """Initialize database schema"""
432
+
433
+ schema_queries = [
434
+ {
435
+ "sql": """CREATE TABLE IF NOT EXISTS users (
436
+ id TEXT PRIMARY KEY,
437
+ username TEXT UNIQUE NOT NULL,
438
+ email TEXT UNIQUE,
439
+ created_at INTEGER DEFAULT (strftime('%s', 'now')),
440
+ updated_at INTEGER DEFAULT (strftime('%s', 'now')),
441
+ metadata TEXT
442
+ )"""
443
+ },
444
+ {
445
+ "sql": """CREATE TABLE IF NOT EXISTS sessions (
446
+ id TEXT PRIMARY KEY,
447
+ user_id TEXT NOT NULL,
448
+ session_data TEXT,
449
+ created_at INTEGER DEFAULT (strftime('%s', 'now')),
450
+ expires_at INTEGER,
451
+ FOREIGN KEY (user_id) REFERENCES users(id)
452
+ )"""
453
+ },
454
+ {
455
+ "sql": """CREATE TABLE IF NOT EXISTS conversations (
456
+ id TEXT PRIMARY KEY,
457
+ user_id TEXT NOT NULL,
458
+ title TEXT,
459
+ messages TEXT,
460
+ created_at INTEGER DEFAULT (strftime('%s', 'now')),
461
+ updated_at INTEGER DEFAULT (strftime('%s', 'now')),
462
+ FOREIGN KEY (user_id) REFERENCES users(id)
463
+ )"""
464
+ },
465
+ {
466
+ "sql": """CREATE TABLE IF NOT EXISTS files (
467
+ id TEXT PRIMARY KEY,
468
+ user_id TEXT NOT NULL,
469
+ filename TEXT NOT NULL,
470
+ file_key TEXT NOT NULL,
471
+ file_size INTEGER,
472
+ content_type TEXT,
473
+ bucket TEXT DEFAULT 'storage',
474
+ created_at INTEGER DEFAULT (strftime('%s', 'now')),
475
+ FOREIGN KEY (user_id) REFERENCES users(id)
476
+ )"""
477
+ },
478
+ {
479
+ "sql": """CREATE TABLE IF NOT EXISTS agent_executions (
480
+ id TEXT PRIMARY KEY,
481
+ user_id TEXT NOT NULL,
482
+ session_id TEXT,
483
+ task_description TEXT,
484
+ status TEXT DEFAULT 'pending',
485
+ result TEXT,
486
+ execution_time INTEGER,
487
+ created_at INTEGER DEFAULT (strftime('%s', 'now')),
488
+ completed_at INTEGER,
489
+ FOREIGN KEY (user_id) REFERENCES users(id)
490
+ )"""
491
+ },
492
+ ]
493
+
494
+ # Add indexes
495
+ index_queries = [
496
+ {
497
+ "sql": "CREATE INDEX IF NOT EXISTS idx_sessions_user_id ON sessions(user_id)"
498
+ },
499
+ {
500
+ "sql": "CREATE INDEX IF NOT EXISTS idx_conversations_user_id ON conversations(user_id)"
501
+ },
502
+ {"sql": "CREATE INDEX IF NOT EXISTS idx_files_user_id ON files(user_id)"},
503
+ {
504
+ "sql": "CREATE INDEX IF NOT EXISTS idx_agent_executions_user_id ON agent_executions(user_id)"
505
+ },
506
+ ]
507
+
508
+ all_queries = schema_queries + index_queries
509
+
510
+ return await self.batch_execute(all_queries)
app/cloudflare/durable_objects.py ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Durable Objects integration for OpenManus
3
+ Provides interface to Cloudflare Durable Objects operations
4
+ """
5
+
6
+ import json
7
+ import time
8
+ from typing import Any, Dict, List, Optional
9
+
10
+ from app.logger import logger
11
+
12
+ from .client import CloudflareClient, CloudflareError
13
+
14
+
15
+ class DurableObjects:
16
+ """Cloudflare Durable Objects client"""
17
+
18
+ def __init__(self, client: CloudflareClient):
19
+ self.client = client
20
+
21
+ async def create_agent_session(
22
+ self, session_id: str, user_id: str, metadata: Optional[Dict[str, Any]] = None
23
+ ) -> Dict[str, Any]:
24
+ """Create a new agent session"""
25
+
26
+ session_data = {
27
+ "sessionId": session_id,
28
+ "userId": user_id,
29
+ "metadata": metadata or {},
30
+ }
31
+
32
+ try:
33
+ response = await self.client.post(
34
+ f"do/agent/{session_id}/start", data=session_data, use_worker=True
35
+ )
36
+
37
+ return {
38
+ "success": True,
39
+ "session_id": session_id,
40
+ "user_id": user_id,
41
+ **response,
42
+ }
43
+
44
+ except CloudflareError as e:
45
+ logger.error(f"Failed to create agent session: {e}")
46
+ raise
47
+
48
+ async def get_agent_session_status(self, session_id: str) -> Dict[str, Any]:
49
+ """Get agent session status"""
50
+
51
+ try:
52
+ response = await self.client.get(
53
+ f"do/agent/{session_id}/status?sessionId={session_id}", use_worker=True
54
+ )
55
+
56
+ return response
57
+
58
+ except CloudflareError as e:
59
+ logger.error(f"Failed to get agent session status: {e}")
60
+ raise
61
+
62
+ async def update_agent_session(
63
+ self, session_id: str, updates: Dict[str, Any]
64
+ ) -> Dict[str, Any]:
65
+ """Update agent session"""
66
+
67
+ update_data = {"sessionId": session_id, "updates": updates}
68
+
69
+ try:
70
+ response = await self.client.post(
71
+ f"do/agent/{session_id}/update", data=update_data, use_worker=True
72
+ )
73
+
74
+ return {"success": True, "session_id": session_id, **response}
75
+
76
+ except CloudflareError as e:
77
+ logger.error(f"Failed to update agent session: {e}")
78
+ raise
79
+
80
+ async def stop_agent_session(self, session_id: str) -> Dict[str, Any]:
81
+ """Stop agent session"""
82
+
83
+ try:
84
+ response = await self.client.post(
85
+ f"do/agent/{session_id}/stop",
86
+ data={"sessionId": session_id},
87
+ use_worker=True,
88
+ )
89
+
90
+ return {"success": True, "session_id": session_id, **response}
91
+
92
+ except CloudflareError as e:
93
+ logger.error(f"Failed to stop agent session: {e}")
94
+ raise
95
+
96
+ async def add_agent_message(
97
+ self, session_id: str, message: Dict[str, Any]
98
+ ) -> Dict[str, Any]:
99
+ """Add a message to agent session"""
100
+
101
+ message_data = {
102
+ "sessionId": session_id,
103
+ "message": {"timestamp": int(time.time()), **message},
104
+ }
105
+
106
+ try:
107
+ response = await self.client.post(
108
+ f"do/agent/{session_id}/messages", data=message_data, use_worker=True
109
+ )
110
+
111
+ return {"success": True, "session_id": session_id, **response}
112
+
113
+ except CloudflareError as e:
114
+ logger.error(f"Failed to add agent message: {e}")
115
+ raise
116
+
117
+ async def get_agent_messages(
118
+ self, session_id: str, limit: int = 50, offset: int = 0
119
+ ) -> Dict[str, Any]:
120
+ """Get agent session messages"""
121
+
122
+ try:
123
+ response = await self.client.get(
124
+ f"do/agent/{session_id}/messages?sessionId={session_id}&limit={limit}&offset={offset}",
125
+ use_worker=True,
126
+ )
127
+
128
+ return response
129
+
130
+ except CloudflareError as e:
131
+ logger.error(f"Failed to get agent messages: {e}")
132
+ raise
133
+
134
+ # Chat Room methods
135
+ async def join_chat_room(
136
+ self,
137
+ room_id: str,
138
+ user_id: str,
139
+ username: str,
140
+ room_config: Optional[Dict[str, Any]] = None,
141
+ ) -> Dict[str, Any]:
142
+ """Join a chat room"""
143
+
144
+ join_data = {
145
+ "userId": user_id,
146
+ "username": username,
147
+ "roomConfig": room_config or {},
148
+ }
149
+
150
+ try:
151
+ response = await self.client.post(
152
+ f"do/chat/{room_id}/join", data=join_data, use_worker=True
153
+ )
154
+
155
+ return {"success": True, "room_id": room_id, "user_id": user_id, **response}
156
+
157
+ except CloudflareError as e:
158
+ logger.error(f"Failed to join chat room: {e}")
159
+ raise
160
+
161
+ async def leave_chat_room(self, room_id: str, user_id: str) -> Dict[str, Any]:
162
+ """Leave a chat room"""
163
+
164
+ leave_data = {"userId": user_id}
165
+
166
+ try:
167
+ response = await self.client.post(
168
+ f"do/chat/{room_id}/leave", data=leave_data, use_worker=True
169
+ )
170
+
171
+ return {"success": True, "room_id": room_id, "user_id": user_id, **response}
172
+
173
+ except CloudflareError as e:
174
+ logger.error(f"Failed to leave chat room: {e}")
175
+ raise
176
+
177
+ async def get_chat_room_info(self, room_id: str) -> Dict[str, Any]:
178
+ """Get chat room information"""
179
+
180
+ try:
181
+ response = await self.client.get(f"do/chat/{room_id}/info", use_worker=True)
182
+
183
+ return response
184
+
185
+ except CloudflareError as e:
186
+ logger.error(f"Failed to get chat room info: {e}")
187
+ raise
188
+
189
+ async def send_chat_message(
190
+ self,
191
+ room_id: str,
192
+ user_id: str,
193
+ username: str,
194
+ content: str,
195
+ message_type: str = "text",
196
+ ) -> Dict[str, Any]:
197
+ """Send a message to chat room"""
198
+
199
+ message_data = {
200
+ "userId": user_id,
201
+ "username": username,
202
+ "content": content,
203
+ "messageType": message_type,
204
+ }
205
+
206
+ try:
207
+ response = await self.client.post(
208
+ f"do/chat/{room_id}/messages", data=message_data, use_worker=True
209
+ )
210
+
211
+ return {"success": True, "room_id": room_id, **response}
212
+
213
+ except CloudflareError as e:
214
+ logger.error(f"Failed to send chat message: {e}")
215
+ raise
216
+
217
+ async def get_chat_messages(
218
+ self, room_id: str, limit: int = 50, offset: int = 0
219
+ ) -> Dict[str, Any]:
220
+ """Get chat room messages"""
221
+
222
+ try:
223
+ response = await self.client.get(
224
+ f"do/chat/{room_id}/messages?limit={limit}&offset={offset}",
225
+ use_worker=True,
226
+ )
227
+
228
+ return response
229
+
230
+ except CloudflareError as e:
231
+ logger.error(f"Failed to get chat messages: {e}")
232
+ raise
233
+
234
+ async def get_chat_participants(self, room_id: str) -> Dict[str, Any]:
235
+ """Get chat room participants"""
236
+
237
+ try:
238
+ response = await self.client.get(
239
+ f"do/chat/{room_id}/participants", use_worker=True
240
+ )
241
+
242
+ return response
243
+
244
+ except CloudflareError as e:
245
+ logger.error(f"Failed to get chat participants: {e}")
246
+ raise
247
+
248
+ # WebSocket connection helpers
249
+ def get_agent_websocket_url(self, session_id: str, user_id: str) -> str:
250
+ """Get WebSocket URL for agent session"""
251
+
252
+ if not self.client.worker_url:
253
+ raise CloudflareError("Worker URL not configured")
254
+
255
+ base_url = self.client.worker_url.replace("https://", "wss://").replace(
256
+ "http://", "ws://"
257
+ )
258
+ return (
259
+ f"{base_url}/do/agent/{session_id}?sessionId={session_id}&userId={user_id}"
260
+ )
261
+
262
+ def get_chat_websocket_url(self, room_id: str, user_id: str, username: str) -> str:
263
+ """Get WebSocket URL for chat room"""
264
+
265
+ if not self.client.worker_url:
266
+ raise CloudflareError("Worker URL not configured")
267
+
268
+ base_url = self.client.worker_url.replace("https://", "wss://").replace(
269
+ "http://", "ws://"
270
+ )
271
+ return f"{base_url}/do/chat/{room_id}?userId={user_id}&username={username}"
272
+
273
+
274
+ class DurableObjectsWebSocket:
275
+ """Helper class for WebSocket connections to Durable Objects"""
276
+
277
+ def __init__(self, url: str):
278
+ self.url = url
279
+ self.websocket = None
280
+ self.connected = False
281
+ self.message_handlers = {}
282
+
283
+ async def connect(self):
284
+ """Connect to WebSocket"""
285
+ try:
286
+ import websockets
287
+
288
+ self.websocket = await websockets.connect(self.url)
289
+ self.connected = True
290
+ logger.info(f"Connected to Durable Object WebSocket: {self.url}")
291
+
292
+ # Start message handling loop
293
+ import asyncio
294
+
295
+ asyncio.create_task(self._message_loop())
296
+
297
+ except Exception as e:
298
+ logger.error(f"Failed to connect to WebSocket: {e}")
299
+ raise CloudflareError(f"WebSocket connection failed: {e}")
300
+
301
+ async def disconnect(self):
302
+ """Disconnect from WebSocket"""
303
+ if self.websocket and self.connected:
304
+ await self.websocket.close()
305
+ self.connected = False
306
+ logger.info("Disconnected from Durable Object WebSocket")
307
+
308
+ async def send_message(self, message_type: str, payload: Dict[str, Any]):
309
+ """Send message via WebSocket"""
310
+ if not self.connected or not self.websocket:
311
+ raise CloudflareError("WebSocket not connected")
312
+
313
+ message = {
314
+ "type": message_type,
315
+ "payload": payload,
316
+ "timestamp": int(time.time()),
317
+ }
318
+
319
+ try:
320
+ await self.websocket.send(json.dumps(message))
321
+ except Exception as e:
322
+ logger.error(f"Failed to send WebSocket message: {e}")
323
+ raise CloudflareError(f"Failed to send message: {e}")
324
+
325
+ def add_message_handler(self, message_type: str, handler):
326
+ """Add a message handler for specific message types"""
327
+ if message_type not in self.message_handlers:
328
+ self.message_handlers[message_type] = []
329
+ self.message_handlers[message_type].append(handler)
330
+
331
+ async def _message_loop(self):
332
+ """Handle incoming WebSocket messages"""
333
+ try:
334
+ async for message in self.websocket:
335
+ try:
336
+ data = json.loads(message)
337
+ message_type = data.get("type")
338
+
339
+ if message_type in self.message_handlers:
340
+ for handler in self.message_handlers[message_type]:
341
+ try:
342
+ if callable(handler):
343
+ if asyncio.iscoroutinefunction(handler):
344
+ await handler(data)
345
+ else:
346
+ handler(data)
347
+ except Exception as e:
348
+ logger.error(f"Message handler error: {e}")
349
+
350
+ except json.JSONDecodeError as e:
351
+ logger.error(f"Failed to parse WebSocket message: {e}")
352
+ except Exception as e:
353
+ logger.error(f"WebSocket message processing error: {e}")
354
+
355
+ except Exception as e:
356
+ logger.error(f"WebSocket message loop error: {e}")
357
+ self.connected = False
358
+
359
+ # Context manager support
360
+ async def __aenter__(self):
361
+ await self.connect()
362
+ return self
363
+
364
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
365
+ await self.disconnect()
app/cloudflare/kv.py ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ KV Storage integration for OpenManus
3
+ Provides interface to Cloudflare KV operations
4
+ """
5
+
6
+ import json
7
+ from typing import Any, Dict, List, Optional
8
+
9
+ from app.logger import logger
10
+
11
+ from .client import CloudflareClient, CloudflareError
12
+
13
+
14
+ class KVStorage:
15
+ """Cloudflare KV Storage client"""
16
+
17
+ def __init__(
18
+ self,
19
+ client: CloudflareClient,
20
+ sessions_namespace_id: str,
21
+ cache_namespace_id: str,
22
+ ):
23
+ self.client = client
24
+ self.sessions_namespace_id = sessions_namespace_id
25
+ self.cache_namespace_id = cache_namespace_id
26
+ self.base_endpoint = f"accounts/{client.account_id}/storage/kv/namespaces"
27
+
28
+ def _get_namespace_id(self, namespace_type: str) -> str:
29
+ """Get namespace ID based on type"""
30
+ if namespace_type == "cache":
31
+ return self.cache_namespace_id
32
+ return self.sessions_namespace_id
33
+
34
+ async def set_value(
35
+ self,
36
+ key: str,
37
+ value: Any,
38
+ namespace_type: str = "sessions",
39
+ ttl: Optional[int] = None,
40
+ use_worker: bool = True,
41
+ ) -> Dict[str, Any]:
42
+ """Set a value in KV store"""
43
+
44
+ namespace_id = self._get_namespace_id(namespace_type)
45
+
46
+ # Serialize value to JSON
47
+ if isinstance(value, (dict, list)):
48
+ serialized_value = json.dumps(value)
49
+ elif isinstance(value, str):
50
+ serialized_value = value
51
+ else:
52
+ serialized_value = json.dumps(value)
53
+
54
+ try:
55
+ if use_worker:
56
+ set_data = {
57
+ "key": key,
58
+ "value": serialized_value,
59
+ "namespace": namespace_type,
60
+ }
61
+
62
+ if ttl:
63
+ set_data["ttl"] = ttl
64
+
65
+ response = await self.client.post(
66
+ f"api/kv/set", data=set_data, use_worker=True
67
+ )
68
+ else:
69
+ # Use KV API directly
70
+ params = {}
71
+ if ttl:
72
+ params["expiration_ttl"] = ttl
73
+
74
+ query_string = "&".join([f"{k}={v}" for k, v in params.items()])
75
+ endpoint = f"{self.base_endpoint}/{namespace_id}/values/{key}"
76
+ if query_string:
77
+ endpoint += f"?{query_string}"
78
+
79
+ response = await self.client.put(
80
+ endpoint, data={"value": serialized_value}
81
+ )
82
+
83
+ return {
84
+ "success": True,
85
+ "key": key,
86
+ "namespace": namespace_type,
87
+ "ttl": ttl,
88
+ **response,
89
+ }
90
+
91
+ except CloudflareError as e:
92
+ logger.error(f"KV set value failed: {e}")
93
+ raise
94
+
95
+ async def get_value(
96
+ self,
97
+ key: str,
98
+ namespace_type: str = "sessions",
99
+ parse_json: bool = True,
100
+ use_worker: bool = True,
101
+ ) -> Optional[Any]:
102
+ """Get a value from KV store"""
103
+
104
+ namespace_id = self._get_namespace_id(namespace_type)
105
+
106
+ try:
107
+ if use_worker:
108
+ response = await self.client.get(
109
+ f"api/kv/get/{key}?namespace={namespace_type}", use_worker=True
110
+ )
111
+
112
+ if response and "value" in response:
113
+ value = response["value"]
114
+
115
+ if parse_json and isinstance(value, str):
116
+ try:
117
+ return json.loads(value)
118
+ except json.JSONDecodeError:
119
+ return value
120
+
121
+ return value
122
+ else:
123
+ response = await self.client.get(
124
+ f"{self.base_endpoint}/{namespace_id}/values/{key}"
125
+ )
126
+
127
+ # KV API returns the value directly as text
128
+ value = (
129
+ response.get("result", {}).get("value")
130
+ if "result" in response
131
+ else response
132
+ )
133
+
134
+ if value and parse_json and isinstance(value, str):
135
+ try:
136
+ return json.loads(value)
137
+ except json.JSONDecodeError:
138
+ return value
139
+
140
+ return value
141
+
142
+ except CloudflareError as e:
143
+ if e.status_code == 404:
144
+ return None
145
+ logger.error(f"KV get value failed: {e}")
146
+ raise
147
+
148
+ return None
149
+
150
+ async def delete_value(
151
+ self, key: str, namespace_type: str = "sessions", use_worker: bool = True
152
+ ) -> Dict[str, Any]:
153
+ """Delete a value from KV store"""
154
+
155
+ namespace_id = self._get_namespace_id(namespace_type)
156
+
157
+ try:
158
+ if use_worker:
159
+ response = await self.client.delete(
160
+ f"api/kv/delete/{key}?namespace={namespace_type}", use_worker=True
161
+ )
162
+ else:
163
+ response = await self.client.delete(
164
+ f"{self.base_endpoint}/{namespace_id}/values/{key}"
165
+ )
166
+
167
+ return {
168
+ "success": True,
169
+ "key": key,
170
+ "namespace": namespace_type,
171
+ **response,
172
+ }
173
+
174
+ except CloudflareError as e:
175
+ logger.error(f"KV delete value failed: {e}")
176
+ raise
177
+
178
+ async def list_keys(
179
+ self,
180
+ namespace_type: str = "sessions",
181
+ prefix: str = "",
182
+ limit: int = 1000,
183
+ use_worker: bool = True,
184
+ ) -> Dict[str, Any]:
185
+ """List keys in KV namespace"""
186
+
187
+ namespace_id = self._get_namespace_id(namespace_type)
188
+
189
+ try:
190
+ if use_worker:
191
+ params = {"namespace": namespace_type, "prefix": prefix, "limit": limit}
192
+
193
+ query_string = "&".join([f"{k}={v}" for k, v in params.items() if v])
194
+ response = await self.client.get(
195
+ f"api/kv/list?{query_string}", use_worker=True
196
+ )
197
+ else:
198
+ params = {"prefix": prefix, "limit": limit}
199
+
200
+ query_string = "&".join([f"{k}={v}" for k, v in params.items() if v])
201
+ response = await self.client.get(
202
+ f"{self.base_endpoint}/{namespace_id}/keys?{query_string}"
203
+ )
204
+
205
+ return {
206
+ "namespace": namespace_type,
207
+ "prefix": prefix,
208
+ "keys": (
209
+ response.get("result", [])
210
+ if "result" in response
211
+ else response.get("keys", [])
212
+ ),
213
+ **response,
214
+ }
215
+
216
+ except CloudflareError as e:
217
+ logger.error(f"KV list keys failed: {e}")
218
+ raise
219
+
220
+ # Session-specific methods
221
+ async def set_session(
222
+ self,
223
+ session_id: str,
224
+ session_data: Dict[str, Any],
225
+ ttl: int = 86400, # 24 hours default
226
+ ) -> Dict[str, Any]:
227
+ """Set session data"""
228
+
229
+ data = {
230
+ **session_data,
231
+ "created_at": session_data.get("created_at", int(time.time())),
232
+ "expires_at": int(time.time()) + ttl,
233
+ }
234
+
235
+ return await self.set_value(f"session:{session_id}", data, "sessions", ttl)
236
+
237
+ async def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
238
+ """Get session data"""
239
+
240
+ session = await self.get_value(f"session:{session_id}", "sessions")
241
+
242
+ if session and isinstance(session, dict):
243
+ # Check if session is expired
244
+ expires_at = session.get("expires_at")
245
+ if expires_at and int(time.time()) > expires_at:
246
+ await self.delete_session(session_id)
247
+ return None
248
+
249
+ return session
250
+
251
+ async def delete_session(self, session_id: str) -> Dict[str, Any]:
252
+ """Delete session data"""
253
+
254
+ return await self.delete_value(f"session:{session_id}", "sessions")
255
+
256
+ async def update_session(
257
+ self, session_id: str, updates: Dict[str, Any], extend_ttl: Optional[int] = None
258
+ ) -> Dict[str, Any]:
259
+ """Update session data"""
260
+
261
+ existing_session = await self.get_session(session_id)
262
+
263
+ if not existing_session:
264
+ raise CloudflareError("Session not found")
265
+
266
+ updated_data = {**existing_session, **updates, "updated_at": int(time.time())}
267
+
268
+ # Calculate TTL
269
+ ttl = None
270
+ if extend_ttl:
271
+ ttl = extend_ttl
272
+ elif existing_session.get("expires_at"):
273
+ ttl = max(0, existing_session["expires_at"] - int(time.time()))
274
+
275
+ return await self.set_session(session_id, updated_data, ttl or 86400)
276
+
277
+ # Cache-specific methods
278
+ async def set_cache(
279
+ self, key: str, data: Any, ttl: int = 3600 # 1 hour default
280
+ ) -> Dict[str, Any]:
281
+ """Set cache data"""
282
+
283
+ cache_data = {
284
+ "data": data,
285
+ "cached_at": int(time.time()),
286
+ "expires_at": int(time.time()) + ttl,
287
+ }
288
+
289
+ return await self.set_value(f"cache:{key}", cache_data, "cache", ttl)
290
+
291
+ async def get_cache(self, key: str) -> Optional[Any]:
292
+ """Get cache data"""
293
+
294
+ cached = await self.get_value(f"cache:{key}", "cache")
295
+
296
+ if cached and isinstance(cached, dict):
297
+ # Check if cache is expired
298
+ expires_at = cached.get("expires_at")
299
+ if expires_at and int(time.time()) > expires_at:
300
+ await self.delete_cache(key)
301
+ return None
302
+
303
+ return cached.get("data")
304
+
305
+ return cached
306
+
307
+ async def delete_cache(self, key: str) -> Dict[str, Any]:
308
+ """Delete cache data"""
309
+
310
+ return await self.delete_value(f"cache:{key}", "cache")
311
+
312
+ # User-specific methods
313
+ async def set_user_cache(
314
+ self, user_id: str, key: str, data: Any, ttl: int = 3600
315
+ ) -> Dict[str, Any]:
316
+ """Set user-specific cache"""
317
+
318
+ user_key = f"user:{user_id}:{key}"
319
+ return await self.set_cache(user_key, data, ttl)
320
+
321
+ async def get_user_cache(self, user_id: str, key: str) -> Optional[Any]:
322
+ """Get user-specific cache"""
323
+
324
+ user_key = f"user:{user_id}:{key}"
325
+ return await self.get_cache(user_key)
326
+
327
+ async def delete_user_cache(self, user_id: str, key: str) -> Dict[str, Any]:
328
+ """Delete user-specific cache"""
329
+
330
+ user_key = f"user:{user_id}:{key}"
331
+ return await self.delete_cache(user_key)
332
+
333
+ async def get_user_cache_keys(self, user_id: str, limit: int = 100) -> List[str]:
334
+ """Get all cache keys for a user"""
335
+
336
+ result = await self.list_keys("cache", f"cache:user:{user_id}:", limit)
337
+
338
+ keys = []
339
+ for key_info in result.get("keys", []):
340
+ if isinstance(key_info, dict):
341
+ key = key_info.get("name", "")
342
+ else:
343
+ key = str(key_info)
344
+
345
+ # Remove prefix to get the actual key
346
+ if key.startswith(f"cache:user:{user_id}:"):
347
+ clean_key = key.replace(f"cache:user:{user_id}:", "")
348
+ keys.append(clean_key)
349
+
350
+ return keys
351
+
352
+ # Conversation caching
353
+ async def cache_conversation(
354
+ self,
355
+ conversation_id: str,
356
+ messages: List[Dict[str, Any]],
357
+ ttl: int = 7200, # 2 hours default
358
+ ) -> Dict[str, Any]:
359
+ """Cache conversation messages"""
360
+
361
+ return await self.set_cache(
362
+ f"conversation:{conversation_id}",
363
+ {"messages": messages, "last_updated": int(time.time())},
364
+ ttl,
365
+ )
366
+
367
+ async def get_cached_conversation(
368
+ self, conversation_id: str
369
+ ) -> Optional[Dict[str, Any]]:
370
+ """Get cached conversation"""
371
+
372
+ return await self.get_cache(f"conversation:{conversation_id}")
373
+
374
+ # Agent execution caching
375
+ async def cache_agent_execution(
376
+ self, execution_id: str, execution_data: Dict[str, Any], ttl: int = 3600
377
+ ) -> Dict[str, Any]:
378
+ """Cache agent execution data"""
379
+
380
+ return await self.set_cache(f"execution:{execution_id}", execution_data, ttl)
381
+
382
+ async def get_cached_agent_execution(
383
+ self, execution_id: str
384
+ ) -> Optional[Dict[str, Any]]:
385
+ """Get cached agent execution"""
386
+
387
+ return await self.get_cache(f"execution:{execution_id}")
388
+
389
+ # Batch operations
390
+ async def set_batch(
391
+ self,
392
+ items: List[Dict[str, Any]],
393
+ namespace_type: str = "cache",
394
+ ttl: Optional[int] = None,
395
+ ) -> Dict[str, Any]:
396
+ """Set multiple values (simulated batch operation)"""
397
+
398
+ results = []
399
+ successful = 0
400
+ failed = 0
401
+
402
+ for item in items:
403
+ try:
404
+ key = item["key"]
405
+ value = item["value"]
406
+ item_ttl = item.get("ttl", ttl)
407
+
408
+ result = await self.set_value(key, value, namespace_type, item_ttl)
409
+ results.append({"key": key, "success": True, "result": result})
410
+ successful += 1
411
+
412
+ except Exception as e:
413
+ results.append(
414
+ {"key": item.get("key"), "success": False, "error": str(e)}
415
+ )
416
+ failed += 1
417
+
418
+ return {
419
+ "success": failed == 0,
420
+ "successful": successful,
421
+ "failed": failed,
422
+ "total": len(items),
423
+ "results": results,
424
+ }
425
+
426
+ async def get_batch(
427
+ self, keys: List[str], namespace_type: str = "cache"
428
+ ) -> Dict[str, Any]:
429
+ """Get multiple values (simulated batch operation)"""
430
+
431
+ results = {}
432
+
433
+ for key in keys:
434
+ try:
435
+ value = await self.get_value(key, namespace_type)
436
+ results[key] = value
437
+ except Exception as e:
438
+ logger.error(f"Failed to get key {key}: {e}")
439
+ results[key] = None
440
+
441
+ return results
442
+
443
+ def _hash_params(self, params: Dict[str, Any]) -> str:
444
+ """Create a hash for cache keys from parameters"""
445
+
446
+ if not params:
447
+ return "no-params"
448
+
449
+ # Simple hash function for cache keys
450
+ import hashlib
451
+
452
+ params_str = json.dumps(params, sort_keys=True)
453
+ return hashlib.md5(params_str.encode()).hexdigest()[:16]
454
+
455
+
456
+ # Add time import at the top
457
+ import time
app/cloudflare/r2.py ADDED
@@ -0,0 +1,434 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ R2 Storage integration for OpenManus
3
+ Provides interface to Cloudflare R2 storage operations
4
+ """
5
+
6
+ import io
7
+ from typing import Any, BinaryIO, Dict, List, Optional
8
+
9
+ from app.logger import logger
10
+
11
+ from .client import CloudflareClient, CloudflareError
12
+
13
+
14
+ class R2Storage:
15
+ """Cloudflare R2 Storage client"""
16
+
17
+ def __init__(
18
+ self,
19
+ client: CloudflareClient,
20
+ storage_bucket: str,
21
+ assets_bucket: Optional[str] = None,
22
+ ):
23
+ self.client = client
24
+ self.storage_bucket = storage_bucket
25
+ self.assets_bucket = assets_bucket or storage_bucket
26
+ self.base_endpoint = f"accounts/{client.account_id}/r2/buckets"
27
+
28
+ def _get_bucket_name(self, bucket_type: str = "storage") -> str:
29
+ """Get bucket name based on type"""
30
+ if bucket_type == "assets":
31
+ return self.assets_bucket
32
+ return self.storage_bucket
33
+
34
+ async def upload_file(
35
+ self,
36
+ key: str,
37
+ file_data: bytes,
38
+ content_type: str = "application/octet-stream",
39
+ bucket_type: str = "storage",
40
+ metadata: Optional[Dict[str, str]] = None,
41
+ use_worker: bool = True,
42
+ ) -> Dict[str, Any]:
43
+ """Upload a file to R2"""
44
+
45
+ bucket_name = self._get_bucket_name(bucket_type)
46
+
47
+ try:
48
+ if use_worker:
49
+ # Use worker endpoint for better performance
50
+ form_data = {
51
+ "file": file_data,
52
+ "bucket": bucket_type,
53
+ "key": key,
54
+ "contentType": content_type,
55
+ }
56
+
57
+ if metadata:
58
+ form_data["metadata"] = metadata
59
+
60
+ response = await self.client.post(
61
+ "api/files", data=form_data, use_worker=True
62
+ )
63
+ else:
64
+ # Use R2 API directly
65
+ headers = {"Content-Type": content_type}
66
+
67
+ if metadata:
68
+ for k, v in metadata.items():
69
+ headers[f"x-amz-meta-{k}"] = v
70
+
71
+ response = await self.client.upload_file(
72
+ f"{self.base_endpoint}/{bucket_name}/objects/{key}",
73
+ file_data,
74
+ content_type,
75
+ headers,
76
+ )
77
+
78
+ return {
79
+ "success": True,
80
+ "key": key,
81
+ "bucket": bucket_type,
82
+ "bucket_name": bucket_name,
83
+ "size": len(file_data),
84
+ "content_type": content_type,
85
+ "url": f"/{bucket_type}/{key}",
86
+ **response,
87
+ }
88
+
89
+ except CloudflareError as e:
90
+ logger.error(f"R2 upload failed: {e}")
91
+ raise
92
+
93
+ async def upload_file_stream(
94
+ self,
95
+ key: str,
96
+ file_stream: BinaryIO,
97
+ content_type: str = "application/octet-stream",
98
+ bucket_type: str = "storage",
99
+ metadata: Optional[Dict[str, str]] = None,
100
+ ) -> Dict[str, Any]:
101
+ """Upload a file from stream"""
102
+
103
+ file_data = file_stream.read()
104
+ return await self.upload_file(
105
+ key, file_data, content_type, bucket_type, metadata
106
+ )
107
+
108
+ async def get_file(
109
+ self, key: str, bucket_type: str = "storage", use_worker: bool = True
110
+ ) -> Optional[Dict[str, Any]]:
111
+ """Get a file from R2"""
112
+
113
+ bucket_name = self._get_bucket_name(bucket_type)
114
+
115
+ try:
116
+ if use_worker:
117
+ response = await self.client.get(
118
+ f"api/files/{key}?bucket={bucket_type}", use_worker=True
119
+ )
120
+
121
+ if response:
122
+ return {
123
+ "key": key,
124
+ "bucket": bucket_type,
125
+ "bucket_name": bucket_name,
126
+ "data": response, # Binary data would be handled by worker
127
+ "exists": True,
128
+ }
129
+ else:
130
+ response = await self.client.get(
131
+ f"{self.base_endpoint}/{bucket_name}/objects/{key}"
132
+ )
133
+
134
+ return {
135
+ "key": key,
136
+ "bucket": bucket_type,
137
+ "bucket_name": bucket_name,
138
+ "data": response,
139
+ "exists": True,
140
+ }
141
+
142
+ except CloudflareError as e:
143
+ if e.status_code == 404:
144
+ return None
145
+ logger.error(f"R2 get file failed: {e}")
146
+ raise
147
+
148
+ return None
149
+
150
+ async def delete_file(
151
+ self, key: str, bucket_type: str = "storage", use_worker: bool = True
152
+ ) -> Dict[str, Any]:
153
+ """Delete a file from R2"""
154
+
155
+ bucket_name = self._get_bucket_name(bucket_type)
156
+
157
+ try:
158
+ if use_worker:
159
+ response = await self.client.delete(
160
+ f"api/files/{key}?bucket={bucket_type}", use_worker=True
161
+ )
162
+ else:
163
+ response = await self.client.delete(
164
+ f"{self.base_endpoint}/{bucket_name}/objects/{key}"
165
+ )
166
+
167
+ return {
168
+ "success": True,
169
+ "key": key,
170
+ "bucket": bucket_type,
171
+ "bucket_name": bucket_name,
172
+ **response,
173
+ }
174
+
175
+ except CloudflareError as e:
176
+ logger.error(f"R2 delete failed: {e}")
177
+ raise
178
+
179
+ async def list_files(
180
+ self,
181
+ bucket_type: str = "storage",
182
+ prefix: str = "",
183
+ limit: int = 1000,
184
+ use_worker: bool = True,
185
+ ) -> Dict[str, Any]:
186
+ """List files in R2 bucket"""
187
+
188
+ bucket_name = self._get_bucket_name(bucket_type)
189
+
190
+ try:
191
+ if use_worker:
192
+ params = {"bucket": bucket_type, "prefix": prefix, "limit": limit}
193
+
194
+ query_string = "&".join([f"{k}={v}" for k, v in params.items() if v])
195
+ response = await self.client.get(
196
+ f"api/files/list?{query_string}", use_worker=True
197
+ )
198
+ else:
199
+ params = {"prefix": prefix, "max-keys": limit}
200
+
201
+ query_string = "&".join([f"{k}={v}" for k, v in params.items() if v])
202
+ response = await self.client.get(
203
+ f"{self.base_endpoint}/{bucket_name}/objects?{query_string}"
204
+ )
205
+
206
+ return {
207
+ "bucket": bucket_type,
208
+ "bucket_name": bucket_name,
209
+ "prefix": prefix,
210
+ "files": response.get("objects", []),
211
+ "truncated": response.get("truncated", False),
212
+ **response,
213
+ }
214
+
215
+ except CloudflareError as e:
216
+ logger.error(f"R2 list files failed: {e}")
217
+ raise
218
+
219
+ async def get_file_metadata(
220
+ self, key: str, bucket_type: str = "storage", use_worker: bool = True
221
+ ) -> Optional[Dict[str, Any]]:
222
+ """Get file metadata without downloading content"""
223
+
224
+ bucket_name = self._get_bucket_name(bucket_type)
225
+
226
+ try:
227
+ if use_worker:
228
+ response = await self.client.get(
229
+ f"api/files/{key}/metadata?bucket={bucket_type}", use_worker=True
230
+ )
231
+ else:
232
+ # Use HEAD request to get metadata only
233
+ response = await self.client.get(
234
+ f"{self.base_endpoint}/{bucket_name}/objects/{key}",
235
+ headers={"Range": "bytes=0-0"}, # Minimal range to get headers
236
+ )
237
+
238
+ if response:
239
+ return {
240
+ "key": key,
241
+ "bucket": bucket_type,
242
+ "bucket_name": bucket_name,
243
+ **response,
244
+ }
245
+
246
+ except CloudflareError as e:
247
+ if e.status_code == 404:
248
+ return None
249
+ logger.error(f"R2 get metadata failed: {e}")
250
+ raise
251
+
252
+ return None
253
+
254
+ async def copy_file(
255
+ self,
256
+ source_key: str,
257
+ destination_key: str,
258
+ source_bucket: str = "storage",
259
+ destination_bucket: str = "storage",
260
+ use_worker: bool = True,
261
+ ) -> Dict[str, Any]:
262
+ """Copy a file within R2 or between buckets"""
263
+
264
+ try:
265
+ if use_worker:
266
+ copy_data = {
267
+ "sourceKey": source_key,
268
+ "destinationKey": destination_key,
269
+ "sourceBucket": source_bucket,
270
+ "destinationBucket": destination_bucket,
271
+ }
272
+
273
+ response = await self.client.post(
274
+ "api/files/copy", data=copy_data, use_worker=True
275
+ )
276
+ else:
277
+ # Get source file first
278
+ source_file = await self.get_file(source_key, source_bucket, False)
279
+
280
+ if not source_file:
281
+ raise CloudflareError(f"Source file {source_key} not found")
282
+
283
+ # Upload to destination
284
+ response = await self.upload_file(
285
+ destination_key,
286
+ source_file["data"],
287
+ bucket_type=destination_bucket,
288
+ use_worker=False,
289
+ )
290
+
291
+ return {
292
+ "success": True,
293
+ "source_key": source_key,
294
+ "destination_key": destination_key,
295
+ "source_bucket": source_bucket,
296
+ "destination_bucket": destination_bucket,
297
+ **response,
298
+ }
299
+
300
+ except CloudflareError as e:
301
+ logger.error(f"R2 copy failed: {e}")
302
+ raise
303
+
304
+ async def move_file(
305
+ self,
306
+ source_key: str,
307
+ destination_key: str,
308
+ source_bucket: str = "storage",
309
+ destination_bucket: str = "storage",
310
+ use_worker: bool = True,
311
+ ) -> Dict[str, Any]:
312
+ """Move a file (copy then delete)"""
313
+
314
+ try:
315
+ # Copy file first
316
+ copy_result = await self.copy_file(
317
+ source_key,
318
+ destination_key,
319
+ source_bucket,
320
+ destination_bucket,
321
+ use_worker,
322
+ )
323
+
324
+ # Delete source file
325
+ delete_result = await self.delete_file(
326
+ source_key, source_bucket, use_worker
327
+ )
328
+
329
+ return {
330
+ "success": True,
331
+ "source_key": source_key,
332
+ "destination_key": destination_key,
333
+ "source_bucket": source_bucket,
334
+ "destination_bucket": destination_bucket,
335
+ "copy_result": copy_result,
336
+ "delete_result": delete_result,
337
+ }
338
+
339
+ except CloudflareError as e:
340
+ logger.error(f"R2 move failed: {e}")
341
+ raise
342
+
343
+ async def generate_presigned_url(
344
+ self,
345
+ key: str,
346
+ bucket_type: str = "storage",
347
+ expires_in: int = 3600,
348
+ method: str = "GET",
349
+ ) -> Dict[str, Any]:
350
+ """Generate a presigned URL for direct access"""
351
+
352
+ # Note: This would typically require additional R2 configuration
353
+ # For now, return a worker endpoint URL
354
+
355
+ try:
356
+ url_data = {
357
+ "key": key,
358
+ "bucket": bucket_type,
359
+ "expiresIn": expires_in,
360
+ "method": method,
361
+ }
362
+
363
+ response = await self.client.post(
364
+ "api/files/presigned-url", data=url_data, use_worker=True
365
+ )
366
+
367
+ return {
368
+ "success": True,
369
+ "key": key,
370
+ "bucket": bucket_type,
371
+ "method": method,
372
+ "expires_in": expires_in,
373
+ **response,
374
+ }
375
+
376
+ except CloudflareError as e:
377
+ logger.error(f"R2 presigned URL generation failed: {e}")
378
+ raise
379
+
380
+ async def get_storage_stats(self, use_worker: bool = True) -> Dict[str, Any]:
381
+ """Get storage statistics"""
382
+
383
+ try:
384
+ if use_worker:
385
+ response = await self.client.get("api/files/stats", use_worker=True)
386
+ else:
387
+ # Get stats for both buckets
388
+ storage_list = await self.list_files("storage", use_worker=False)
389
+ assets_list = await self.list_files("assets", use_worker=False)
390
+
391
+ storage_size = sum(
392
+ file.get("size", 0) for file in storage_list.get("files", [])
393
+ )
394
+ assets_size = sum(
395
+ file.get("size", 0) for file in assets_list.get("files", [])
396
+ )
397
+
398
+ response = {
399
+ "storage": {
400
+ "file_count": len(storage_list.get("files", [])),
401
+ "total_size": storage_size,
402
+ },
403
+ "assets": {
404
+ "file_count": len(assets_list.get("files", [])),
405
+ "total_size": assets_size,
406
+ },
407
+ "total": {
408
+ "file_count": len(storage_list.get("files", []))
409
+ + len(assets_list.get("files", [])),
410
+ "total_size": storage_size + assets_size,
411
+ },
412
+ }
413
+
414
+ return response
415
+
416
+ except CloudflareError as e:
417
+ logger.error(f"R2 storage stats failed: {e}")
418
+ raise
419
+
420
+ def create_file_stream(self, data: bytes) -> io.BytesIO:
421
+ """Create a file stream from bytes"""
422
+ return io.BytesIO(data)
423
+
424
+ def get_public_url(self, key: str, bucket_type: str = "storage") -> str:
425
+ """Get public URL for a file (if bucket is configured for public access)"""
426
+ bucket_name = self._get_bucket_name(bucket_type)
427
+
428
+ # This would depend on your R2 custom domain configuration
429
+ # For now, return the worker endpoint
430
+ if self.client.worker_url:
431
+ return f"{self.client.worker_url}/api/files/{key}?bucket={bucket_type}"
432
+
433
+ # Default R2 URL format (requires public access configuration)
434
+ return f"https://pub-{bucket_name}.r2.dev/{key}"
app/config.py ADDED
@@ -0,0 +1,372 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import threading
3
+ import tomllib
4
+ from pathlib import Path
5
+ from typing import Dict, List, Optional
6
+
7
+ from pydantic import BaseModel, Field
8
+
9
+
10
+ def get_project_root() -> Path:
11
+ """Get the project root directory"""
12
+ return Path(__file__).resolve().parent.parent
13
+
14
+
15
+ PROJECT_ROOT = get_project_root()
16
+ WORKSPACE_ROOT = PROJECT_ROOT / "workspace"
17
+
18
+
19
+ class LLMSettings(BaseModel):
20
+ model: str = Field(..., description="Model name")
21
+ base_url: str = Field(..., description="API base URL")
22
+ api_key: str = Field(..., description="API key")
23
+ max_tokens: int = Field(4096, description="Maximum number of tokens per request")
24
+ max_input_tokens: Optional[int] = Field(
25
+ None,
26
+ description="Maximum input tokens to use across all requests (None for unlimited)",
27
+ )
28
+ temperature: float = Field(1.0, description="Sampling temperature")
29
+ api_type: str = Field(..., description="Azure, Openai, or Ollama")
30
+ api_version: str = Field(..., description="Azure Openai version if AzureOpenai")
31
+
32
+
33
+ class ProxySettings(BaseModel):
34
+ server: str = Field(None, description="Proxy server address")
35
+ username: Optional[str] = Field(None, description="Proxy username")
36
+ password: Optional[str] = Field(None, description="Proxy password")
37
+
38
+
39
+ class SearchSettings(BaseModel):
40
+ engine: str = Field(default="Google", description="Search engine the llm to use")
41
+ fallback_engines: List[str] = Field(
42
+ default_factory=lambda: ["DuckDuckGo", "Baidu", "Bing"],
43
+ description="Fallback search engines to try if the primary engine fails",
44
+ )
45
+ retry_delay: int = Field(
46
+ default=60,
47
+ description="Seconds to wait before retrying all engines again after they all fail",
48
+ )
49
+ max_retries: int = Field(
50
+ default=3,
51
+ description="Maximum number of times to retry all engines when all fail",
52
+ )
53
+ lang: str = Field(
54
+ default="en",
55
+ description="Language code for search results (e.g., en, zh, fr)",
56
+ )
57
+ country: str = Field(
58
+ default="us",
59
+ description="Country code for search results (e.g., us, cn, uk)",
60
+ )
61
+
62
+
63
+ class RunflowSettings(BaseModel):
64
+ use_data_analysis_agent: bool = Field(
65
+ default=False, description="Enable data analysis agent in run flow"
66
+ )
67
+
68
+
69
+ class BrowserSettings(BaseModel):
70
+ headless: bool = Field(False, description="Whether to run browser in headless mode")
71
+ disable_security: bool = Field(
72
+ True, description="Disable browser security features"
73
+ )
74
+ extra_chromium_args: List[str] = Field(
75
+ default_factory=list, description="Extra arguments to pass to the browser"
76
+ )
77
+ chrome_instance_path: Optional[str] = Field(
78
+ None, description="Path to a Chrome instance to use"
79
+ )
80
+ wss_url: Optional[str] = Field(
81
+ None, description="Connect to a browser instance via WebSocket"
82
+ )
83
+ cdp_url: Optional[str] = Field(
84
+ None, description="Connect to a browser instance via CDP"
85
+ )
86
+ proxy: Optional[ProxySettings] = Field(
87
+ None, description="Proxy settings for the browser"
88
+ )
89
+ max_content_length: int = Field(
90
+ 2000, description="Maximum length for content retrieval operations"
91
+ )
92
+
93
+
94
+ class SandboxSettings(BaseModel):
95
+ """Configuration for the execution sandbox"""
96
+
97
+ use_sandbox: bool = Field(False, description="Whether to use the sandbox")
98
+ image: str = Field("python:3.12-slim", description="Base image")
99
+ work_dir: str = Field("/workspace", description="Container working directory")
100
+ memory_limit: str = Field("512m", description="Memory limit")
101
+ cpu_limit: float = Field(1.0, description="CPU limit")
102
+ timeout: int = Field(300, description="Default command timeout (seconds)")
103
+ network_enabled: bool = Field(
104
+ False, description="Whether network access is allowed"
105
+ )
106
+
107
+
108
+ class DaytonaSettings(BaseModel):
109
+ daytona_api_key: str
110
+ daytona_server_url: Optional[str] = Field(
111
+ "https://app.daytona.io/api", description=""
112
+ )
113
+ daytona_target: Optional[str] = Field("us", description="enum ['eu', 'us']")
114
+ sandbox_image_name: Optional[str] = Field("whitezxj/sandbox:0.1.0", description="")
115
+ sandbox_entrypoint: Optional[str] = Field(
116
+ "/usr/bin/supervisord -n -c /etc/supervisor/conf.d/supervisord.conf",
117
+ description="",
118
+ )
119
+ # sandbox_id: Optional[str] = Field(
120
+ # None, description="ID of the daytona sandbox to use, if any"
121
+ # )
122
+ VNC_password: Optional[str] = Field(
123
+ "123456", description="VNC password for the vnc service in sandbox"
124
+ )
125
+
126
+
127
+ class MCPServerConfig(BaseModel):
128
+ """Configuration for a single MCP server"""
129
+
130
+ type: str = Field(..., description="Server connection type (sse or stdio)")
131
+ url: Optional[str] = Field(None, description="Server URL for SSE connections")
132
+ command: Optional[str] = Field(None, description="Command for stdio connections")
133
+ args: List[str] = Field(
134
+ default_factory=list, description="Arguments for stdio command"
135
+ )
136
+
137
+
138
+ class MCPSettings(BaseModel):
139
+ """Configuration for MCP (Model Context Protocol)"""
140
+
141
+ server_reference: str = Field(
142
+ "app.mcp.server", description="Module reference for the MCP server"
143
+ )
144
+ servers: Dict[str, MCPServerConfig] = Field(
145
+ default_factory=dict, description="MCP server configurations"
146
+ )
147
+
148
+ @classmethod
149
+ def load_server_config(cls) -> Dict[str, MCPServerConfig]:
150
+ """Load MCP server configuration from JSON file"""
151
+ config_path = PROJECT_ROOT / "config" / "mcp.json"
152
+
153
+ try:
154
+ config_file = config_path if config_path.exists() else None
155
+ if not config_file:
156
+ return {}
157
+
158
+ with config_file.open() as f:
159
+ data = json.load(f)
160
+ servers = {}
161
+
162
+ for server_id, server_config in data.get("mcpServers", {}).items():
163
+ servers[server_id] = MCPServerConfig(
164
+ type=server_config["type"],
165
+ url=server_config.get("url"),
166
+ command=server_config.get("command"),
167
+ args=server_config.get("args", []),
168
+ )
169
+ return servers
170
+ except Exception as e:
171
+ raise ValueError(f"Failed to load MCP server config: {e}")
172
+
173
+
174
+ class AppConfig(BaseModel):
175
+ llm: Dict[str, LLMSettings]
176
+ sandbox: Optional[SandboxSettings] = Field(
177
+ None, description="Sandbox configuration"
178
+ )
179
+ browser_config: Optional[BrowserSettings] = Field(
180
+ None, description="Browser configuration"
181
+ )
182
+ search_config: Optional[SearchSettings] = Field(
183
+ None, description="Search configuration"
184
+ )
185
+ mcp_config: Optional[MCPSettings] = Field(None, description="MCP configuration")
186
+ run_flow_config: Optional[RunflowSettings] = Field(
187
+ None, description="Run flow configuration"
188
+ )
189
+ daytona_config: Optional[DaytonaSettings] = Field(
190
+ None, description="Daytona configuration"
191
+ )
192
+
193
+ class Config:
194
+ arbitrary_types_allowed = True
195
+
196
+
197
+ class Config:
198
+ _instance = None
199
+ _lock = threading.Lock()
200
+ _initialized = False
201
+
202
+ def __new__(cls):
203
+ if cls._instance is None:
204
+ with cls._lock:
205
+ if cls._instance is None:
206
+ cls._instance = super().__new__(cls)
207
+ return cls._instance
208
+
209
+ def __init__(self):
210
+ if not self._initialized:
211
+ with self._lock:
212
+ if not self._initialized:
213
+ self._config = None
214
+ self._load_initial_config()
215
+ self._initialized = True
216
+
217
+ @staticmethod
218
+ def _get_config_path() -> Path:
219
+ root = PROJECT_ROOT
220
+ config_path = root / "config" / "config.toml"
221
+ if config_path.exists():
222
+ return config_path
223
+ example_path = root / "config" / "config.example.toml"
224
+ if example_path.exists():
225
+ return example_path
226
+ raise FileNotFoundError("No configuration file found in config directory")
227
+
228
+ def _load_config(self) -> dict:
229
+ config_path = self._get_config_path()
230
+ with config_path.open("rb") as f:
231
+ return tomllib.load(f)
232
+
233
+ def _load_initial_config(self):
234
+ raw_config = self._load_config()
235
+ base_llm = raw_config.get("llm", {})
236
+ llm_overrides = {
237
+ k: v for k, v in raw_config.get("llm", {}).items() if isinstance(v, dict)
238
+ }
239
+
240
+ default_settings = {
241
+ "model": base_llm.get("model"),
242
+ "base_url": base_llm.get("base_url"),
243
+ "api_key": base_llm.get("api_key"),
244
+ "max_tokens": base_llm.get("max_tokens", 4096),
245
+ "max_input_tokens": base_llm.get("max_input_tokens"),
246
+ "temperature": base_llm.get("temperature", 1.0),
247
+ "api_type": base_llm.get("api_type", ""),
248
+ "api_version": base_llm.get("api_version", ""),
249
+ }
250
+
251
+ # handle browser config.
252
+ browser_config = raw_config.get("browser", {})
253
+ browser_settings = None
254
+
255
+ if browser_config:
256
+ # handle proxy settings.
257
+ proxy_config = browser_config.get("proxy", {})
258
+ proxy_settings = None
259
+
260
+ if proxy_config and proxy_config.get("server"):
261
+ proxy_settings = ProxySettings(
262
+ **{
263
+ k: v
264
+ for k, v in proxy_config.items()
265
+ if k in ["server", "username", "password"] and v
266
+ }
267
+ )
268
+
269
+ # filter valid browser config parameters.
270
+ valid_browser_params = {
271
+ k: v
272
+ for k, v in browser_config.items()
273
+ if k in BrowserSettings.__annotations__ and v is not None
274
+ }
275
+
276
+ # if there is proxy settings, add it to the parameters.
277
+ if proxy_settings:
278
+ valid_browser_params["proxy"] = proxy_settings
279
+
280
+ # only create BrowserSettings when there are valid parameters.
281
+ if valid_browser_params:
282
+ browser_settings = BrowserSettings(**valid_browser_params)
283
+
284
+ search_config = raw_config.get("search", {})
285
+ search_settings = None
286
+ if search_config:
287
+ search_settings = SearchSettings(**search_config)
288
+ sandbox_config = raw_config.get("sandbox", {})
289
+ if sandbox_config:
290
+ sandbox_settings = SandboxSettings(**sandbox_config)
291
+ else:
292
+ sandbox_settings = SandboxSettings()
293
+ daytona_config = raw_config.get("daytona", {})
294
+ if daytona_config:
295
+ daytona_settings = DaytonaSettings(**daytona_config)
296
+ else:
297
+ daytona_settings = DaytonaSettings()
298
+
299
+ mcp_config = raw_config.get("mcp", {})
300
+ mcp_settings = None
301
+ if mcp_config:
302
+ # Load server configurations from JSON
303
+ mcp_config["servers"] = MCPSettings.load_server_config()
304
+ mcp_settings = MCPSettings(**mcp_config)
305
+ else:
306
+ mcp_settings = MCPSettings(servers=MCPSettings.load_server_config())
307
+
308
+ run_flow_config = raw_config.get("runflow")
309
+ if run_flow_config:
310
+ run_flow_settings = RunflowSettings(**run_flow_config)
311
+ else:
312
+ run_flow_settings = RunflowSettings()
313
+ config_dict = {
314
+ "llm": {
315
+ "default": default_settings,
316
+ **{
317
+ name: {**default_settings, **override_config}
318
+ for name, override_config in llm_overrides.items()
319
+ },
320
+ },
321
+ "sandbox": sandbox_settings,
322
+ "browser_config": browser_settings,
323
+ "search_config": search_settings,
324
+ "mcp_config": mcp_settings,
325
+ "run_flow_config": run_flow_settings,
326
+ "daytona_config": daytona_settings,
327
+ }
328
+
329
+ self._config = AppConfig(**config_dict)
330
+
331
+ @property
332
+ def llm(self) -> Dict[str, LLMSettings]:
333
+ return self._config.llm
334
+
335
+ @property
336
+ def sandbox(self) -> SandboxSettings:
337
+ return self._config.sandbox
338
+
339
+ @property
340
+ def daytona(self) -> DaytonaSettings:
341
+ return self._config.daytona_config
342
+
343
+ @property
344
+ def browser_config(self) -> Optional[BrowserSettings]:
345
+ return self._config.browser_config
346
+
347
+ @property
348
+ def search_config(self) -> Optional[SearchSettings]:
349
+ return self._config.search_config
350
+
351
+ @property
352
+ def mcp_config(self) -> MCPSettings:
353
+ """Get the MCP configuration"""
354
+ return self._config.mcp_config
355
+
356
+ @property
357
+ def run_flow_config(self) -> RunflowSettings:
358
+ """Get the Run Flow configuration"""
359
+ return self._config.run_flow_config
360
+
361
+ @property
362
+ def workspace_root(self) -> Path:
363
+ """Get the workspace root directory"""
364
+ return WORKSPACE_ROOT
365
+
366
+ @property
367
+ def root_path(self) -> Path:
368
+ """Get the root path of the application"""
369
+ return PROJECT_ROOT
370
+
371
+
372
+ config = Config()
app/config_cloudflare.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration extensions for Cloudflare integration
3
+ """
4
+
5
+ import os
6
+ from typing import Optional
7
+
8
+ from pydantic import BaseModel, Field
9
+
10
+
11
+ class CloudflareSettings(BaseModel):
12
+ """Cloudflare configuration settings"""
13
+
14
+ api_token: Optional[str] = Field(
15
+ default_factory=lambda: os.getenv("CLOUDFLARE_API_TOKEN"),
16
+ description="Cloudflare API token",
17
+ )
18
+
19
+ account_id: Optional[str] = Field(
20
+ default_factory=lambda: os.getenv("CLOUDFLARE_ACCOUNT_ID"),
21
+ description="Cloudflare account ID",
22
+ )
23
+
24
+ worker_url: Optional[str] = Field(
25
+ default_factory=lambda: os.getenv("CLOUDFLARE_WORKER_URL"),
26
+ description="Cloudflare Worker URL",
27
+ )
28
+
29
+ # D1 Database settings
30
+ d1_database_id: Optional[str] = Field(
31
+ default_factory=lambda: os.getenv("CLOUDFLARE_D1_DATABASE_ID"),
32
+ description="D1 database ID",
33
+ )
34
+
35
+ # KV Namespace settings
36
+ kv_sessions_id: Optional[str] = Field(
37
+ default_factory=lambda: os.getenv("CLOUDFLARE_KV_SESSIONS_ID"),
38
+ description="KV namespace ID for sessions",
39
+ )
40
+
41
+ kv_cache_id: Optional[str] = Field(
42
+ default_factory=lambda: os.getenv("CLOUDFLARE_KV_CACHE_ID"),
43
+ description="KV namespace ID for cache",
44
+ )
45
+
46
+ # R2 Bucket settings
47
+ r2_storage_bucket: str = Field(
48
+ default_factory=lambda: os.getenv(
49
+ "CLOUDFLARE_R2_STORAGE_BUCKET", "openmanus-storage"
50
+ ),
51
+ description="R2 storage bucket name",
52
+ )
53
+
54
+ r2_assets_bucket: str = Field(
55
+ default_factory=lambda: os.getenv(
56
+ "CLOUDFLARE_R2_ASSETS_BUCKET", "openmanus-assets"
57
+ ),
58
+ description="R2 assets bucket name",
59
+ )
60
+
61
+ # Connection settings
62
+ timeout: int = Field(default=30, description="Request timeout in seconds")
63
+
64
+ def is_configured(self) -> bool:
65
+ """Check if minimum Cloudflare configuration is available"""
66
+ return bool(self.api_token and self.account_id)
67
+
68
+ def has_worker(self) -> bool:
69
+ """Check if worker URL is configured"""
70
+ return bool(self.worker_url)
71
+
72
+ def has_d1(self) -> bool:
73
+ """Check if D1 database is configured"""
74
+ return bool(self.d1_database_id)
75
+
76
+ def has_kv(self) -> bool:
77
+ """Check if KV namespaces are configured"""
78
+ return bool(self.kv_sessions_id and self.kv_cache_id)
79
+
80
+
81
+ class HuggingFaceSettings(BaseModel):
82
+ """Hugging Face configuration settings"""
83
+
84
+ token: Optional[str] = Field(
85
+ default_factory=lambda: os.getenv("HUGGINGFACE_TOKEN"),
86
+ description="Hugging Face API token",
87
+ )
88
+
89
+ cache_dir: str = Field(
90
+ default_factory=lambda: os.getenv(
91
+ "HF_HOME", "/app/OpenManus/.cache/huggingface"
92
+ ),
93
+ description="Hugging Face cache directory",
94
+ )
95
+
96
+ model_cache_size: int = Field(
97
+ default=5, description="Maximum number of models to cache"
98
+ )
99
+
100
+
101
+ class DeploymentSettings(BaseModel):
102
+ """Deployment-specific settings"""
103
+
104
+ environment: str = Field(
105
+ default_factory=lambda: os.getenv("ENVIRONMENT", "development"),
106
+ description="Deployment environment",
107
+ )
108
+
109
+ debug: bool = Field(
110
+ default_factory=lambda: os.getenv("DEBUG", "false").lower() == "true",
111
+ description="Enable debug mode",
112
+ )
113
+
114
+ log_level: str = Field(
115
+ default_factory=lambda: os.getenv("LOG_LEVEL", "INFO"),
116
+ description="Logging level",
117
+ )
118
+
119
+ # Gradio settings
120
+ server_name: str = Field(
121
+ default_factory=lambda: os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),
122
+ description="Gradio server name",
123
+ )
124
+
125
+ server_port: int = Field(
126
+ default_factory=lambda: int(os.getenv("GRADIO_SERVER_PORT", "7860")),
127
+ description="Gradio server port",
128
+ )
129
+
130
+ # Security settings
131
+ secret_key: Optional[str] = Field(
132
+ default_factory=lambda: os.getenv("SECRET_KEY"),
133
+ description="Secret key for sessions",
134
+ )
135
+
136
+ jwt_secret: Optional[str] = Field(
137
+ default_factory=lambda: os.getenv("JWT_SECRET"),
138
+ description="JWT signing secret",
139
+ )
140
+
141
+
142
+ # Create global instances
143
+ cloudflare_config = CloudflareSettings()
144
+ huggingface_config = HuggingFaceSettings()
145
+ deployment_config = DeploymentSettings()
app/daytona/README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent with Daytona sandbox
2
+
3
+
4
+
5
+
6
+ ## Prerequisites
7
+ - conda activate 'Your OpenManus python env'
8
+ - pip install daytona==0.21.8 structlog==25.4.0
9
+
10
+
11
+
12
+ ## Setup & Running
13
+
14
+ 1. daytona config :
15
+ ```bash
16
+ cd OpenManus
17
+ cp config/config.example-daytona.toml config/config.toml
18
+ ```
19
+ 2. get daytona apikey :
20
+ goto https://app.daytona.io/dashboard/keys and create your apikey
21
+
22
+ 3. set your apikey in config.toml
23
+ ```toml
24
+ # daytona config
25
+ [daytona]
26
+ daytona_api_key = ""
27
+ #daytona_server_url = "https://app.daytona.io/api"
28
+ #daytona_target = "us" #Daytona is currently available in the following regions:United States (us)、Europe (eu)
29
+ #sandbox_image_name = "whitezxj/sandbox:0.1.0" #If you don't use this default image,sandbox tools may be useless
30
+ #sandbox_entrypoint = "/usr/bin/supervisord -n -c /etc/supervisor/conf.d/supervisord.conf" #If you change this entrypoint,server in sandbox may be useless
31
+ #VNC_password = #The password you set to log in sandbox by VNC,it will be 123456 if you don't set
32
+ ```
33
+ 2. Run :
34
+
35
+ ```bash
36
+ cd OpenManus
37
+ python sandbox_main.py
38
+ ```
39
+
40
+ 3. Send tasks to Agent
41
+ You can sent tasks to Agent by terminate,agent will use sandbox tools to handle your tasks.
42
+
43
+ 4. See results
44
+ If agent use sb_browser_use tool, you can see the operations by VNC link, The VNC link will print in the termination,e.g.:https://6080-sandbox-123456.h7890.daytona.work.
45
+ If agent use sb_shell tool, you can see the results by terminate of sandbox in https://app.daytona.io/dashboard/sandboxes.
46
+ Agent can use sb_files tool to operate files to sandbox.
47
+
48
+
49
+ ## Example
50
+
51
+ You can send task e.g.:"帮我在https://hk.trip.com/travel-guide/guidebook/nanjing-9696/?ishideheader=true&isHideNavBar=YES&disableFontScaling=1&catalogId=514634&locale=zh-HK查询相关信息上制定一份南京旅游攻略,并在工作区保存为index.html"
52
+
53
+ Then you can see the agent's browser action in VNC link(https://6080-sandbox-123456.h7890.proxy.daytona.work) and you can see the html made by agent in Website URL(https://8080-sandbox-123456.h7890.proxy.daytona.work).
54
+
55
+ ## Learn More
56
+
57
+ - [Daytona Documentation](https://www.daytona.io/docs/)
app/daytona/sandbox.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+
3
+ from daytona import (
4
+ CreateSandboxFromImageParams,
5
+ Daytona,
6
+ DaytonaConfig,
7
+ Resources,
8
+ Sandbox,
9
+ SandboxState,
10
+ SessionExecuteRequest,
11
+ )
12
+
13
+ from app.config import config
14
+ from app.utils.logger import logger
15
+
16
+
17
+ # load_dotenv()
18
+ daytona_settings = config.daytona
19
+ logger.info("Initializing Daytona sandbox configuration")
20
+ daytona_config = DaytonaConfig(
21
+ api_key=daytona_settings.daytona_api_key,
22
+ server_url=daytona_settings.daytona_server_url,
23
+ target=daytona_settings.daytona_target,
24
+ )
25
+
26
+ if daytona_config.api_key:
27
+ logger.info("Daytona API key configured successfully")
28
+ else:
29
+ logger.warning("No Daytona API key found in environment variables")
30
+
31
+ if daytona_config.server_url:
32
+ logger.info(f"Daytona server URL set to: {daytona_config.server_url}")
33
+ else:
34
+ logger.warning("No Daytona server URL found in environment variables")
35
+
36
+ if daytona_config.target:
37
+ logger.info(f"Daytona target set to: {daytona_config.target}")
38
+ else:
39
+ logger.warning("No Daytona target found in environment variables")
40
+
41
+ daytona = Daytona(daytona_config)
42
+ logger.info("Daytona client initialized")
43
+
44
+
45
+ async def get_or_start_sandbox(sandbox_id: str):
46
+ """Retrieve a sandbox by ID, check its state, and start it if needed."""
47
+
48
+ logger.info(f"Getting or starting sandbox with ID: {sandbox_id}")
49
+
50
+ try:
51
+ sandbox = daytona.get(sandbox_id)
52
+
53
+ # Check if sandbox needs to be started
54
+ if (
55
+ sandbox.state == SandboxState.ARCHIVED
56
+ or sandbox.state == SandboxState.STOPPED
57
+ ):
58
+ logger.info(f"Sandbox is in {sandbox.state} state. Starting...")
59
+ try:
60
+ daytona.start(sandbox)
61
+ # Wait a moment for the sandbox to initialize
62
+ # sleep(5)
63
+ # Refresh sandbox state after starting
64
+ sandbox = daytona.get(sandbox_id)
65
+
66
+ # Start supervisord in a session when restarting
67
+ start_supervisord_session(sandbox)
68
+ except Exception as e:
69
+ logger.error(f"Error starting sandbox: {e}")
70
+ raise e
71
+
72
+ logger.info(f"Sandbox {sandbox_id} is ready")
73
+ return sandbox
74
+
75
+ except Exception as e:
76
+ logger.error(f"Error retrieving or starting sandbox: {str(e)}")
77
+ raise e
78
+
79
+
80
+ def start_supervisord_session(sandbox: Sandbox):
81
+ """Start supervisord in a session."""
82
+ session_id = "supervisord-session"
83
+ try:
84
+ logger.info(f"Creating session {session_id} for supervisord")
85
+ sandbox.process.create_session(session_id)
86
+
87
+ # Execute supervisord command
88
+ sandbox.process.execute_session_command(
89
+ session_id,
90
+ SessionExecuteRequest(
91
+ command="exec /usr/bin/supervisord -n -c /etc/supervisor/conf.d/supervisord.conf",
92
+ var_async=True,
93
+ ),
94
+ )
95
+ time.sleep(25) # Wait a bit to ensure supervisord starts properly
96
+ logger.info(f"Supervisord started in session {session_id}")
97
+ except Exception as e:
98
+ logger.error(f"Error starting supervisord session: {str(e)}")
99
+ raise e
100
+
101
+
102
+ def create_sandbox(password: str, project_id: str = None):
103
+ """Create a new sandbox with all required services configured and running."""
104
+
105
+ logger.info("Creating new Daytona sandbox environment")
106
+ logger.info("Configuring sandbox with browser-use image and environment variables")
107
+
108
+ labels = None
109
+ if project_id:
110
+ logger.info(f"Using sandbox_id as label: {project_id}")
111
+ labels = {"id": project_id}
112
+
113
+ params = CreateSandboxFromImageParams(
114
+ image=daytona_settings.sandbox_image_name,
115
+ public=True,
116
+ labels=labels,
117
+ env_vars={
118
+ "CHROME_PERSISTENT_SESSION": "true",
119
+ "RESOLUTION": "1024x768x24",
120
+ "RESOLUTION_WIDTH": "1024",
121
+ "RESOLUTION_HEIGHT": "768",
122
+ "VNC_PASSWORD": password,
123
+ "ANONYMIZED_TELEMETRY": "false",
124
+ "CHROME_PATH": "",
125
+ "CHROME_USER_DATA": "",
126
+ "CHROME_DEBUGGING_PORT": "9222",
127
+ "CHROME_DEBUGGING_HOST": "localhost",
128
+ "CHROME_CDP": "",
129
+ },
130
+ resources=Resources(
131
+ cpu=2,
132
+ memory=4,
133
+ disk=5,
134
+ ),
135
+ auto_stop_interval=15,
136
+ auto_archive_interval=24 * 60,
137
+ )
138
+
139
+ # Create the sandbox
140
+ sandbox = daytona.create(params)
141
+ logger.info(f"Sandbox created with ID: {sandbox.id}")
142
+
143
+ # Start supervisord in a session for new sandbox
144
+ start_supervisord_session(sandbox)
145
+
146
+ logger.info(f"Sandbox environment successfully initialized")
147
+ return sandbox
148
+
149
+
150
+ async def delete_sandbox(sandbox_id: str):
151
+ """Delete a sandbox by its ID."""
152
+ logger.info(f"Deleting sandbox with ID: {sandbox_id}")
153
+
154
+ try:
155
+ # Get the sandbox
156
+ sandbox = daytona.get(sandbox_id)
157
+
158
+ # Delete the sandbox
159
+ daytona.delete(sandbox)
160
+
161
+ logger.info(f"Successfully deleted sandbox {sandbox_id}")
162
+ return True
163
+ except Exception as e:
164
+ logger.error(f"Error deleting sandbox {sandbox_id}: {str(e)}")
165
+ raise e
app/daytona/tool_base.py ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass, field
2
+ from datetime import datetime
3
+ from typing import Any, ClassVar, Dict, Optional
4
+
5
+ from daytona import Daytona, DaytonaConfig, Sandbox, SandboxState
6
+ from pydantic import Field
7
+
8
+ from app.config import config
9
+ from app.daytona.sandbox import create_sandbox, start_supervisord_session
10
+ from app.tool.base import BaseTool
11
+ from app.utils.files_utils import clean_path
12
+ from app.utils.logger import logger
13
+
14
+
15
+ # load_dotenv()
16
+ daytona_settings = config.daytona
17
+ daytona_config = DaytonaConfig(
18
+ api_key=daytona_settings.daytona_api_key,
19
+ server_url=daytona_settings.daytona_server_url,
20
+ target=daytona_settings.daytona_target,
21
+ )
22
+ daytona = Daytona(daytona_config)
23
+
24
+
25
+ @dataclass
26
+ class ThreadMessage:
27
+ """
28
+ Represents a message to be added to a thread.
29
+ """
30
+
31
+ type: str
32
+ content: Dict[str, Any]
33
+ is_llm_message: bool = False
34
+ metadata: Optional[Dict[str, Any]] = None
35
+ timestamp: Optional[float] = field(
36
+ default_factory=lambda: datetime.now().timestamp()
37
+ )
38
+
39
+ def to_dict(self) -> Dict[str, Any]:
40
+ """Convert the message to a dictionary for API calls"""
41
+ return {
42
+ "type": self.type,
43
+ "content": self.content,
44
+ "is_llm_message": self.is_llm_message,
45
+ "metadata": self.metadata or {},
46
+ "timestamp": self.timestamp,
47
+ }
48
+
49
+
50
+ class SandboxToolsBase(BaseTool):
51
+ """Base class for all sandbox tools that provides project-based sandbox access."""
52
+
53
+ # Class variable to track if sandbox URLs have been printed
54
+ _urls_printed: ClassVar[bool] = False
55
+
56
+ # Required fields
57
+ project_id: Optional[str] = None
58
+ # thread_manager: Optional[ThreadManager] = None
59
+
60
+ # Private fields (not part of the model schema)
61
+ _sandbox: Optional[Sandbox] = None
62
+ _sandbox_id: Optional[str] = None
63
+ _sandbox_pass: Optional[str] = None
64
+ workspace_path: str = Field(default="/workspace", exclude=True)
65
+ _sessions: dict[str, str] = {}
66
+
67
+ class Config:
68
+ arbitrary_types_allowed = True # Allow non-pydantic types like ThreadManager
69
+ underscore_attrs_are_private = True
70
+
71
+ async def _ensure_sandbox(self) -> Sandbox:
72
+ """Ensure we have a valid sandbox instance, retrieving it from the project if needed."""
73
+ if self._sandbox is None:
74
+ # Get or start the sandbox
75
+ try:
76
+ self._sandbox = create_sandbox(password=config.daytona.VNC_password)
77
+ # Log URLs if not already printed
78
+ if not SandboxToolsBase._urls_printed:
79
+ vnc_link = self._sandbox.get_preview_link(6080)
80
+ website_link = self._sandbox.get_preview_link(8080)
81
+
82
+ vnc_url = (
83
+ vnc_link.url if hasattr(vnc_link, "url") else str(vnc_link)
84
+ )
85
+ website_url = (
86
+ website_link.url
87
+ if hasattr(website_link, "url")
88
+ else str(website_link)
89
+ )
90
+
91
+ print("\033[95m***")
92
+ print(f"VNC URL: {vnc_url}")
93
+ print(f"Website URL: {website_url}")
94
+ print("***\033[0m")
95
+ SandboxToolsBase._urls_printed = True
96
+ except Exception as e:
97
+ logger.error(f"Error retrieving or starting sandbox: {str(e)}")
98
+ raise e
99
+ else:
100
+ if (
101
+ self._sandbox.state == SandboxState.ARCHIVED
102
+ or self._sandbox.state == SandboxState.STOPPED
103
+ ):
104
+ logger.info(f"Sandbox is in {self._sandbox.state} state. Starting...")
105
+ try:
106
+ daytona.start(self._sandbox)
107
+ # Wait a moment for the sandbox to initialize
108
+ # sleep(5)
109
+ # Refresh sandbox state after starting
110
+
111
+ # Start supervisord in a session when restarting
112
+ start_supervisord_session(self._sandbox)
113
+ except Exception as e:
114
+ logger.error(f"Error starting sandbox: {e}")
115
+ raise e
116
+ return self._sandbox
117
+
118
+ @property
119
+ def sandbox(self) -> Sandbox:
120
+ """Get the sandbox instance, ensuring it exists."""
121
+ if self._sandbox is None:
122
+ raise RuntimeError("Sandbox not initialized. Call _ensure_sandbox() first.")
123
+ return self._sandbox
124
+
125
+ @property
126
+ def sandbox_id(self) -> str:
127
+ """Get the sandbox ID, ensuring it exists."""
128
+ if self._sandbox_id is None:
129
+ raise RuntimeError(
130
+ "Sandbox ID not initialized. Call _ensure_sandbox() first."
131
+ )
132
+ return self._sandbox_id
133
+
134
+ def clean_path(self, path: str) -> str:
135
+ """Clean and normalize a path to be relative to /workspace."""
136
+ cleaned_path = clean_path(path, self.workspace_path)
137
+ logger.debug(f"Cleaned path: {path} -> {cleaned_path}")
138
+ return cleaned_path
app/exceptions.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ class ToolError(Exception):
2
+ """Raised when a tool encounters an error."""
3
+
4
+ def __init__(self, message):
5
+ self.message = message
6
+
7
+
8
+ class OpenManusError(Exception):
9
+ """Base exception for all OpenManus errors"""
10
+
11
+
12
+ class TokenLimitExceeded(OpenManusError):
13
+ """Exception raised when the token limit is exceeded"""
app/flow/__init__.py ADDED
File without changes
app/flow/base.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import ABC, abstractmethod
2
+ from typing import Dict, List, Optional, Union
3
+
4
+ from pydantic import BaseModel
5
+
6
+ from app.agent.base import BaseAgent
7
+
8
+
9
+ class BaseFlow(BaseModel, ABC):
10
+ """Base class for execution flows supporting multiple agents"""
11
+
12
+ agents: Dict[str, BaseAgent]
13
+ tools: Optional[List] = None
14
+ primary_agent_key: Optional[str] = None
15
+
16
+ class Config:
17
+ arbitrary_types_allowed = True
18
+
19
+ def __init__(
20
+ self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
21
+ ):
22
+ # Handle different ways of providing agents
23
+ if isinstance(agents, BaseAgent):
24
+ agents_dict = {"default": agents}
25
+ elif isinstance(agents, list):
26
+ agents_dict = {f"agent_{i}": agent for i, agent in enumerate(agents)}
27
+ else:
28
+ agents_dict = agents
29
+
30
+ # If primary agent not specified, use first agent
31
+ primary_key = data.get("primary_agent_key")
32
+ if not primary_key and agents_dict:
33
+ primary_key = next(iter(agents_dict))
34
+ data["primary_agent_key"] = primary_key
35
+
36
+ # Set the agents dictionary
37
+ data["agents"] = agents_dict
38
+
39
+ # Initialize using BaseModel's init
40
+ super().__init__(**data)
41
+
42
+ @property
43
+ def primary_agent(self) -> Optional[BaseAgent]:
44
+ """Get the primary agent for the flow"""
45
+ return self.agents.get(self.primary_agent_key)
46
+
47
+ def get_agent(self, key: str) -> Optional[BaseAgent]:
48
+ """Get a specific agent by key"""
49
+ return self.agents.get(key)
50
+
51
+ def add_agent(self, key: str, agent: BaseAgent) -> None:
52
+ """Add a new agent to the flow"""
53
+ self.agents[key] = agent
54
+
55
+ @abstractmethod
56
+ async def execute(self, input_text: str) -> str:
57
+ """Execute the flow with given input"""
app/flow/flow_factory.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from enum import Enum
2
+ from typing import Dict, List, Union
3
+
4
+ from app.agent.base import BaseAgent
5
+ from app.flow.base import BaseFlow
6
+ from app.flow.planning import PlanningFlow
7
+
8
+
9
+ class FlowType(str, Enum):
10
+ PLANNING = "planning"
11
+
12
+
13
+ class FlowFactory:
14
+ """Factory for creating different types of flows with support for multiple agents"""
15
+
16
+ @staticmethod
17
+ def create_flow(
18
+ flow_type: FlowType,
19
+ agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]],
20
+ **kwargs,
21
+ ) -> BaseFlow:
22
+ flows = {
23
+ FlowType.PLANNING: PlanningFlow,
24
+ }
25
+
26
+ flow_class = flows.get(flow_type)
27
+ if not flow_class:
28
+ raise ValueError(f"Unknown flow type: {flow_type}")
29
+
30
+ return flow_class(agents, **kwargs)
app/flow/planning.py ADDED
@@ -0,0 +1,442 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import time
3
+ from enum import Enum
4
+ from typing import Dict, List, Optional, Union
5
+
6
+ from pydantic import Field
7
+
8
+ from app.agent.base import BaseAgent
9
+ from app.flow.base import BaseFlow
10
+ from app.llm import LLM
11
+ from app.logger import logger
12
+ from app.schema import AgentState, Message, ToolChoice
13
+ from app.tool import PlanningTool
14
+
15
+
16
+ class PlanStepStatus(str, Enum):
17
+ """Enum class defining possible statuses of a plan step"""
18
+
19
+ NOT_STARTED = "not_started"
20
+ IN_PROGRESS = "in_progress"
21
+ COMPLETED = "completed"
22
+ BLOCKED = "blocked"
23
+
24
+ @classmethod
25
+ def get_all_statuses(cls) -> list[str]:
26
+ """Return a list of all possible step status values"""
27
+ return [status.value for status in cls]
28
+
29
+ @classmethod
30
+ def get_active_statuses(cls) -> list[str]:
31
+ """Return a list of values representing active statuses (not started or in progress)"""
32
+ return [cls.NOT_STARTED.value, cls.IN_PROGRESS.value]
33
+
34
+ @classmethod
35
+ def get_status_marks(cls) -> Dict[str, str]:
36
+ """Return a mapping of statuses to their marker symbols"""
37
+ return {
38
+ cls.COMPLETED.value: "[✓]",
39
+ cls.IN_PROGRESS.value: "[→]",
40
+ cls.BLOCKED.value: "[!]",
41
+ cls.NOT_STARTED.value: "[ ]",
42
+ }
43
+
44
+
45
+ class PlanningFlow(BaseFlow):
46
+ """A flow that manages planning and execution of tasks using agents."""
47
+
48
+ llm: LLM = Field(default_factory=lambda: LLM())
49
+ planning_tool: PlanningTool = Field(default_factory=PlanningTool)
50
+ executor_keys: List[str] = Field(default_factory=list)
51
+ active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
52
+ current_step_index: Optional[int] = None
53
+
54
+ def __init__(
55
+ self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
56
+ ):
57
+ # Set executor keys before super().__init__
58
+ if "executors" in data:
59
+ data["executor_keys"] = data.pop("executors")
60
+
61
+ # Set plan ID if provided
62
+ if "plan_id" in data:
63
+ data["active_plan_id"] = data.pop("plan_id")
64
+
65
+ # Initialize the planning tool if not provided
66
+ if "planning_tool" not in data:
67
+ planning_tool = PlanningTool()
68
+ data["planning_tool"] = planning_tool
69
+
70
+ # Call parent's init with the processed data
71
+ super().__init__(agents, **data)
72
+
73
+ # Set executor_keys to all agent keys if not specified
74
+ if not self.executor_keys:
75
+ self.executor_keys = list(self.agents.keys())
76
+
77
+ def get_executor(self, step_type: Optional[str] = None) -> BaseAgent:
78
+ """
79
+ Get an appropriate executor agent for the current step.
80
+ Can be extended to select agents based on step type/requirements.
81
+ """
82
+ # If step type is provided and matches an agent key, use that agent
83
+ if step_type and step_type in self.agents:
84
+ return self.agents[step_type]
85
+
86
+ # Otherwise use the first available executor or fall back to primary agent
87
+ for key in self.executor_keys:
88
+ if key in self.agents:
89
+ return self.agents[key]
90
+
91
+ # Fallback to primary agent
92
+ return self.primary_agent
93
+
94
+ async def execute(self, input_text: str) -> str:
95
+ """Execute the planning flow with agents."""
96
+ try:
97
+ if not self.primary_agent:
98
+ raise ValueError("No primary agent available")
99
+
100
+ # Create initial plan if input provided
101
+ if input_text:
102
+ await self._create_initial_plan(input_text)
103
+
104
+ # Verify plan was created successfully
105
+ if self.active_plan_id not in self.planning_tool.plans:
106
+ logger.error(
107
+ f"Plan creation failed. Plan ID {self.active_plan_id} not found in planning tool."
108
+ )
109
+ return f"Failed to create plan for: {input_text}"
110
+
111
+ result = ""
112
+ while True:
113
+ # Get current step to execute
114
+ self.current_step_index, step_info = await self._get_current_step_info()
115
+
116
+ # Exit if no more steps or plan completed
117
+ if self.current_step_index is None:
118
+ result += await self._finalize_plan()
119
+ break
120
+
121
+ # Execute current step with appropriate agent
122
+ step_type = step_info.get("type") if step_info else None
123
+ executor = self.get_executor(step_type)
124
+ step_result = await self._execute_step(executor, step_info)
125
+ result += step_result + "\n"
126
+
127
+ # Check if agent wants to terminate
128
+ if hasattr(executor, "state") and executor.state == AgentState.FINISHED:
129
+ break
130
+
131
+ return result
132
+ except Exception as e:
133
+ logger.error(f"Error in PlanningFlow: {str(e)}")
134
+ return f"Execution failed: {str(e)}"
135
+
136
+ async def _create_initial_plan(self, request: str) -> None:
137
+ """Create an initial plan based on the request using the flow's LLM and PlanningTool."""
138
+ logger.info(f"Creating initial plan with ID: {self.active_plan_id}")
139
+
140
+ system_message_content = (
141
+ "You are a planning assistant. Create a concise, actionable plan with clear steps. "
142
+ "Focus on key milestones rather than detailed sub-steps. "
143
+ "Optimize for clarity and efficiency."
144
+ )
145
+ agents_description = []
146
+ for key in self.executor_keys:
147
+ if key in self.agents:
148
+ agents_description.append(
149
+ {
150
+ "name": key.upper(),
151
+ "description": self.agents[key].description,
152
+ }
153
+ )
154
+ if len(agents_description) > 1:
155
+ # Add description of agents to select
156
+ system_message_content += (
157
+ f"\nNow we have {agents_description} agents. "
158
+ f"The infomation of them are below: {json.dumps(agents_description)}\n"
159
+ "When creating steps in the planning tool, please specify the agent names using the format '[agent_name]'."
160
+ )
161
+
162
+ # Create a system message for plan creation
163
+ system_message = Message.system_message(system_message_content)
164
+
165
+ # Create a user message with the request
166
+ user_message = Message.user_message(
167
+ f"Create a reasonable plan with clear steps to accomplish the task: {request}"
168
+ )
169
+
170
+ # Call LLM with PlanningTool
171
+ response = await self.llm.ask_tool(
172
+ messages=[user_message],
173
+ system_msgs=[system_message],
174
+ tools=[self.planning_tool.to_param()],
175
+ tool_choice=ToolChoice.AUTO,
176
+ )
177
+
178
+ # Process tool calls if present
179
+ if response.tool_calls:
180
+ for tool_call in response.tool_calls:
181
+ if tool_call.function.name == "planning":
182
+ # Parse the arguments
183
+ args = tool_call.function.arguments
184
+ if isinstance(args, str):
185
+ try:
186
+ args = json.loads(args)
187
+ except json.JSONDecodeError:
188
+ logger.error(f"Failed to parse tool arguments: {args}")
189
+ continue
190
+
191
+ # Ensure plan_id is set correctly and execute the tool
192
+ args["plan_id"] = self.active_plan_id
193
+
194
+ # Execute the tool via ToolCollection instead of directly
195
+ result = await self.planning_tool.execute(**args)
196
+
197
+ logger.info(f"Plan creation result: {str(result)}")
198
+ return
199
+
200
+ # If execution reached here, create a default plan
201
+ logger.warning("Creating default plan")
202
+
203
+ # Create default plan using the ToolCollection
204
+ await self.planning_tool.execute(
205
+ **{
206
+ "command": "create",
207
+ "plan_id": self.active_plan_id,
208
+ "title": f"Plan for: {request[:50]}{'...' if len(request) > 50 else ''}",
209
+ "steps": ["Analyze request", "Execute task", "Verify results"],
210
+ }
211
+ )
212
+
213
+ async def _get_current_step_info(self) -> tuple[Optional[int], Optional[dict]]:
214
+ """
215
+ Parse the current plan to identify the first non-completed step's index and info.
216
+ Returns (None, None) if no active step is found.
217
+ """
218
+ if (
219
+ not self.active_plan_id
220
+ or self.active_plan_id not in self.planning_tool.plans
221
+ ):
222
+ logger.error(f"Plan with ID {self.active_plan_id} not found")
223
+ return None, None
224
+
225
+ try:
226
+ # Direct access to plan data from planning tool storage
227
+ plan_data = self.planning_tool.plans[self.active_plan_id]
228
+ steps = plan_data.get("steps", [])
229
+ step_statuses = plan_data.get("step_statuses", [])
230
+
231
+ # Find first non-completed step
232
+ for i, step in enumerate(steps):
233
+ if i >= len(step_statuses):
234
+ status = PlanStepStatus.NOT_STARTED.value
235
+ else:
236
+ status = step_statuses[i]
237
+
238
+ if status in PlanStepStatus.get_active_statuses():
239
+ # Extract step type/category if available
240
+ step_info = {"text": step}
241
+
242
+ # Try to extract step type from the text (e.g., [SEARCH] or [CODE])
243
+ import re
244
+
245
+ type_match = re.search(r"\[([A-Z_]+)\]", step)
246
+ if type_match:
247
+ step_info["type"] = type_match.group(1).lower()
248
+
249
+ # Mark current step as in_progress
250
+ try:
251
+ await self.planning_tool.execute(
252
+ command="mark_step",
253
+ plan_id=self.active_plan_id,
254
+ step_index=i,
255
+ step_status=PlanStepStatus.IN_PROGRESS.value,
256
+ )
257
+ except Exception as e:
258
+ logger.warning(f"Error marking step as in_progress: {e}")
259
+ # Update step status directly if needed
260
+ if i < len(step_statuses):
261
+ step_statuses[i] = PlanStepStatus.IN_PROGRESS.value
262
+ else:
263
+ while len(step_statuses) < i:
264
+ step_statuses.append(PlanStepStatus.NOT_STARTED.value)
265
+ step_statuses.append(PlanStepStatus.IN_PROGRESS.value)
266
+
267
+ plan_data["step_statuses"] = step_statuses
268
+
269
+ return i, step_info
270
+
271
+ return None, None # No active step found
272
+
273
+ except Exception as e:
274
+ logger.warning(f"Error finding current step index: {e}")
275
+ return None, None
276
+
277
+ async def _execute_step(self, executor: BaseAgent, step_info: dict) -> str:
278
+ """Execute the current step with the specified agent using agent.run()."""
279
+ # Prepare context for the agent with current plan status
280
+ plan_status = await self._get_plan_text()
281
+ step_text = step_info.get("text", f"Step {self.current_step_index}")
282
+
283
+ # Create a prompt for the agent to execute the current step
284
+ step_prompt = f"""
285
+ CURRENT PLAN STATUS:
286
+ {plan_status}
287
+
288
+ YOUR CURRENT TASK:
289
+ You are now working on step {self.current_step_index}: "{step_text}"
290
+
291
+ Please only execute this current step using the appropriate tools. When you're done, provide a summary of what you accomplished.
292
+ """
293
+
294
+ # Use agent.run() to execute the step
295
+ try:
296
+ step_result = await executor.run(step_prompt)
297
+
298
+ # Mark the step as completed after successful execution
299
+ await self._mark_step_completed()
300
+
301
+ return step_result
302
+ except Exception as e:
303
+ logger.error(f"Error executing step {self.current_step_index}: {e}")
304
+ return f"Error executing step {self.current_step_index}: {str(e)}"
305
+
306
+ async def _mark_step_completed(self) -> None:
307
+ """Mark the current step as completed."""
308
+ if self.current_step_index is None:
309
+ return
310
+
311
+ try:
312
+ # Mark the step as completed
313
+ await self.planning_tool.execute(
314
+ command="mark_step",
315
+ plan_id=self.active_plan_id,
316
+ step_index=self.current_step_index,
317
+ step_status=PlanStepStatus.COMPLETED.value,
318
+ )
319
+ logger.info(
320
+ f"Marked step {self.current_step_index} as completed in plan {self.active_plan_id}"
321
+ )
322
+ except Exception as e:
323
+ logger.warning(f"Failed to update plan status: {e}")
324
+ # Update step status directly in planning tool storage
325
+ if self.active_plan_id in self.planning_tool.plans:
326
+ plan_data = self.planning_tool.plans[self.active_plan_id]
327
+ step_statuses = plan_data.get("step_statuses", [])
328
+
329
+ # Ensure the step_statuses list is long enough
330
+ while len(step_statuses) <= self.current_step_index:
331
+ step_statuses.append(PlanStepStatus.NOT_STARTED.value)
332
+
333
+ # Update the status
334
+ step_statuses[self.current_step_index] = PlanStepStatus.COMPLETED.value
335
+ plan_data["step_statuses"] = step_statuses
336
+
337
+ async def _get_plan_text(self) -> str:
338
+ """Get the current plan as formatted text."""
339
+ try:
340
+ result = await self.planning_tool.execute(
341
+ command="get", plan_id=self.active_plan_id
342
+ )
343
+ return result.output if hasattr(result, "output") else str(result)
344
+ except Exception as e:
345
+ logger.error(f"Error getting plan: {e}")
346
+ return self._generate_plan_text_from_storage()
347
+
348
+ def _generate_plan_text_from_storage(self) -> str:
349
+ """Generate plan text directly from storage if the planning tool fails."""
350
+ try:
351
+ if self.active_plan_id not in self.planning_tool.plans:
352
+ return f"Error: Plan with ID {self.active_plan_id} not found"
353
+
354
+ plan_data = self.planning_tool.plans[self.active_plan_id]
355
+ title = plan_data.get("title", "Untitled Plan")
356
+ steps = plan_data.get("steps", [])
357
+ step_statuses = plan_data.get("step_statuses", [])
358
+ step_notes = plan_data.get("step_notes", [])
359
+
360
+ # Ensure step_statuses and step_notes match the number of steps
361
+ while len(step_statuses) < len(steps):
362
+ step_statuses.append(PlanStepStatus.NOT_STARTED.value)
363
+ while len(step_notes) < len(steps):
364
+ step_notes.append("")
365
+
366
+ # Count steps by status
367
+ status_counts = {status: 0 for status in PlanStepStatus.get_all_statuses()}
368
+
369
+ for status in step_statuses:
370
+ if status in status_counts:
371
+ status_counts[status] += 1
372
+
373
+ completed = status_counts[PlanStepStatus.COMPLETED.value]
374
+ total = len(steps)
375
+ progress = (completed / total) * 100 if total > 0 else 0
376
+
377
+ plan_text = f"Plan: {title} (ID: {self.active_plan_id})\n"
378
+ plan_text += "=" * len(plan_text) + "\n\n"
379
+
380
+ plan_text += (
381
+ f"Progress: {completed}/{total} steps completed ({progress:.1f}%)\n"
382
+ )
383
+ plan_text += f"Status: {status_counts[PlanStepStatus.COMPLETED.value]} completed, {status_counts[PlanStepStatus.IN_PROGRESS.value]} in progress, "
384
+ plan_text += f"{status_counts[PlanStepStatus.BLOCKED.value]} blocked, {status_counts[PlanStepStatus.NOT_STARTED.value]} not started\n\n"
385
+ plan_text += "Steps:\n"
386
+
387
+ status_marks = PlanStepStatus.get_status_marks()
388
+
389
+ for i, (step, status, notes) in enumerate(
390
+ zip(steps, step_statuses, step_notes)
391
+ ):
392
+ # Use status marks to indicate step status
393
+ status_mark = status_marks.get(
394
+ status, status_marks[PlanStepStatus.NOT_STARTED.value]
395
+ )
396
+
397
+ plan_text += f"{i}. {status_mark} {step}\n"
398
+ if notes:
399
+ plan_text += f" Notes: {notes}\n"
400
+
401
+ return plan_text
402
+ except Exception as e:
403
+ logger.error(f"Error generating plan text from storage: {e}")
404
+ return f"Error: Unable to retrieve plan with ID {self.active_plan_id}"
405
+
406
+ async def _finalize_plan(self) -> str:
407
+ """Finalize the plan and provide a summary using the flow's LLM directly."""
408
+ plan_text = await self._get_plan_text()
409
+
410
+ # Create a summary using the flow's LLM directly
411
+ try:
412
+ system_message = Message.system_message(
413
+ "You are a planning assistant. Your task is to summarize the completed plan."
414
+ )
415
+
416
+ user_message = Message.user_message(
417
+ f"The plan has been completed. Here is the final plan status:\n\n{plan_text}\n\nPlease provide a summary of what was accomplished and any final thoughts."
418
+ )
419
+
420
+ response = await self.llm.ask(
421
+ messages=[user_message], system_msgs=[system_message]
422
+ )
423
+
424
+ return f"Plan completed:\n\n{response}"
425
+ except Exception as e:
426
+ logger.error(f"Error finalizing plan with LLM: {e}")
427
+
428
+ # Fallback to using an agent for the summary
429
+ try:
430
+ agent = self.primary_agent
431
+ summary_prompt = f"""
432
+ The plan has been completed. Here is the final plan status:
433
+
434
+ {plan_text}
435
+
436
+ Please provide a summary of what was accomplished and any final thoughts.
437
+ """
438
+ summary = await agent.run(summary_prompt)
439
+ return f"Plan completed:\n\n{summary}"
440
+ except Exception as e2:
441
+ logger.error(f"Error finalizing plan with agent: {e2}")
442
+ return "Plan completed. Error generating summary."
app/huggingface_models.py ADDED
The diff for this file is too large to render. See raw diff
 
app/huggingface_models_backup.py ADDED
@@ -0,0 +1,2237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face Models Integration for OpenManus AI Agent
3
+ Comprehensive integration with Hugging Face Inference API for all model categories
4
+ """
5
+
6
+ import asyncio
7
+ import base64
8
+ import io
9
+ import json
10
+ import logging
11
+ from dataclasses import dataclass
12
+ from enum import Enum
13
+ from typing import Any, Dict, List, Optional, Union
14
+
15
+ import aiohttp
16
+ import PIL.Image
17
+ from pydantic import BaseModel
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+
22
+ class ModelCategory(Enum):
23
+ """Categories of Hugging Face models available"""
24
+
25
+ # Core AI categories
26
+ TEXT_GENERATION = "text-generation"
27
+ TEXT_TO_IMAGE = "text-to-image"
28
+ IMAGE_TO_TEXT = "image-to-text"
29
+ AUTOMATIC_SPEECH_RECOGNITION = "automatic-speech-recognition"
30
+ TEXT_TO_SPEECH = "text-to-speech"
31
+ IMAGE_CLASSIFICATION = "image-classification"
32
+ OBJECT_DETECTION = "object-detection"
33
+ FEATURE_EXTRACTION = "feature-extraction"
34
+ SENTENCE_SIMILARITY = "sentence-similarity"
35
+ TRANSLATION = "translation"
36
+ SUMMARIZATION = "summarization"
37
+ QUESTION_ANSWERING = "question-answering"
38
+ FILL_MASK = "fill-mask"
39
+ TOKEN_CLASSIFICATION = "token-classification"
40
+ ZERO_SHOT_CLASSIFICATION = "zero-shot-classification"
41
+ AUDIO_CLASSIFICATION = "audio-classification"
42
+ CONVERSATIONAL = "conversational"
43
+
44
+ # Video and Motion
45
+ TEXT_TO_VIDEO = "text-to-video"
46
+ VIDEO_TO_TEXT = "video-to-text"
47
+ VIDEO_CLASSIFICATION = "video-classification"
48
+ VIDEO_GENERATION = "video-generation"
49
+ MOTION_GENERATION = "motion-generation"
50
+ DEEPFAKE_DETECTION = "deepfake-detection"
51
+
52
+ # Code and Development
53
+ CODE_GENERATION = "code-generation"
54
+ CODE_COMPLETION = "code-completion"
55
+ CODE_EXPLANATION = "code-explanation"
56
+ CODE_TRANSLATION = "code-translation"
57
+ CODE_REVIEW = "code-review"
58
+ APP_GENERATION = "app-generation"
59
+ API_GENERATION = "api-generation"
60
+ DATABASE_GENERATION = "database-generation"
61
+
62
+ # 3D and AR/VR
63
+ TEXT_TO_3D = "text-to-3d"
64
+ IMAGE_TO_3D = "image-to-3d"
65
+ THREE_D_GENERATION = "3d-generation"
66
+ MESH_GENERATION = "mesh-generation"
67
+ TEXTURE_GENERATION = "texture-generation"
68
+ AR_CONTENT = "ar-content"
69
+ VR_ENVIRONMENT = "vr-environment"
70
+
71
+ # Document Processing
72
+ OCR = "ocr"
73
+ DOCUMENT_ANALYSIS = "document-analysis"
74
+ PDF_PROCESSING = "pdf-processing"
75
+ LAYOUT_ANALYSIS = "layout-analysis"
76
+ TABLE_EXTRACTION = "table-extraction"
77
+ HANDWRITING_RECOGNITION = "handwriting-recognition"
78
+ FORM_PROCESSING = "form-processing"
79
+
80
+ # Multimodal AI
81
+ VISION_LANGUAGE = "vision-language"
82
+ MULTIMODAL_REASONING = "multimodal-reasoning"
83
+ CROSS_MODAL_GENERATION = "cross-modal-generation"
84
+ VISUAL_QUESTION_ANSWERING = "visual-question-answering"
85
+ IMAGE_TEXT_MATCHING = "image-text-matching"
86
+ MULTIMODAL_CHAT = "multimodal-chat"
87
+
88
+ # Specialized AI
89
+ MUSIC_GENERATION = "music-generation"
90
+ VOICE_CLONING = "voice-cloning"
91
+ STYLE_TRANSFER = "style-transfer"
92
+ SUPER_RESOLUTION = "super-resolution"
93
+ IMAGE_INPAINTING = "image-inpainting"
94
+ IMAGE_OUTPAINTING = "image-outpainting"
95
+ BACKGROUND_REMOVAL = "background-removal"
96
+ FACE_RESTORATION = "face-restoration"
97
+
98
+ # Content Creation
99
+ CREATIVE_WRITING = "creative-writing"
100
+ STORY_GENERATION = "story-generation"
101
+ SCREENPLAY_WRITING = "screenplay-writing"
102
+ POETRY_GENERATION = "poetry-generation"
103
+ BLOG_WRITING = "blog-writing"
104
+ MARKETING_COPY = "marketing-copy"
105
+
106
+ # Game Development
107
+ GAME_ASSET_GENERATION = "game-asset-generation"
108
+ CHARACTER_GENERATION = "character-generation"
109
+ LEVEL_GENERATION = "level-generation"
110
+ DIALOGUE_GENERATION = "dialogue-generation"
111
+
112
+ # Science and Research
113
+ PROTEIN_FOLDING = "protein-folding"
114
+ MOLECULE_GENERATION = "molecule-generation"
115
+ SCIENTIFIC_WRITING = "scientific-writing"
116
+ RESEARCH_ASSISTANCE = "research-assistance"
117
+ DATA_ANALYSIS = "data-analysis"
118
+
119
+ # Business and Productivity
120
+ EMAIL_GENERATION = "email-generation"
121
+ PRESENTATION_CREATION = "presentation-creation"
122
+ REPORT_GENERATION = "report-generation"
123
+ MEETING_SUMMARIZATION = "meeting-summarization"
124
+ PROJECT_PLANNING = "project-planning"
125
+
126
+ # AI Teacher and Education
127
+ AI_TUTORING = "ai-tutoring"
128
+ EDUCATIONAL_CONTENT = "educational-content"
129
+ LESSON_PLANNING = "lesson-planning"
130
+ CONCEPT_EXPLANATION = "concept-explanation"
131
+ HOMEWORK_ASSISTANCE = "homework-assistance"
132
+ QUIZ_GENERATION = "quiz-generation"
133
+ CURRICULUM_DESIGN = "curriculum-design"
134
+ LEARNING_ASSESSMENT = "learning-assessment"
135
+ ADAPTIVE_LEARNING = "adaptive-learning"
136
+ SUBJECT_TEACHING = "subject-teaching"
137
+ MATH_TUTORING = "math-tutoring"
138
+ SCIENCE_TUTORING = "science-tutoring"
139
+ LANGUAGE_TUTORING = "language-tutoring"
140
+ HISTORY_TUTORING = "history-tutoring"
141
+ CODING_INSTRUCTION = "coding-instruction"
142
+ EXAM_PREPARATION = "exam-preparation"
143
+ STUDY_GUIDE_CREATION = "study-guide-creation"
144
+ EDUCATIONAL_GAMES = "educational-games"
145
+ LEARNING_ANALYTICS = "learning-analytics"
146
+ PERSONALIZED_LEARNING = "personalized-learning"
147
+
148
+
149
+ @dataclass
150
+ class HFModel:
151
+ """Hugging Face model definition"""
152
+
153
+ name: str
154
+ model_id: str
155
+ category: ModelCategory
156
+ description: str
157
+ endpoint_compatible: bool = False
158
+ requires_auth: bool = False
159
+ max_tokens: Optional[int] = None
160
+ supports_streaming: bool = False
161
+
162
+
163
+ class HuggingFaceModels:
164
+ """Comprehensive collection of Hugging Face models for all categories"""
165
+
166
+ # Text Generation Models (Latest and Popular)
167
+ TEXT_GENERATION_MODELS = [
168
+ HFModel(
169
+ "MiniMax-M2",
170
+ "MiniMaxAI/MiniMax-M2",
171
+ ModelCategory.TEXT_GENERATION,
172
+ "Latest high-performance text generation model",
173
+ True,
174
+ False,
175
+ 4096,
176
+ True,
177
+ ),
178
+ HFModel(
179
+ "Kimi Linear 48B",
180
+ "moonshotai/Kimi-Linear-48B-A3B-Instruct",
181
+ ModelCategory.TEXT_GENERATION,
182
+ "Large instruction-tuned model with linear attention",
183
+ True,
184
+ False,
185
+ 8192,
186
+ True,
187
+ ),
188
+ HFModel(
189
+ "GPT-OSS 20B",
190
+ "openai/gpt-oss-20b",
191
+ ModelCategory.TEXT_GENERATION,
192
+ "Open-source GPT model by OpenAI",
193
+ True,
194
+ False,
195
+ 4096,
196
+ True,
197
+ ),
198
+ HFModel(
199
+ "GPT-OSS 120B",
200
+ "openai/gpt-oss-120b",
201
+ ModelCategory.TEXT_GENERATION,
202
+ "Large open-source GPT model",
203
+ True,
204
+ False,
205
+ 4096,
206
+ True,
207
+ ),
208
+ HFModel(
209
+ "Granite 4.0 1B",
210
+ "ibm-granite/granite-4.0-1b",
211
+ ModelCategory.TEXT_GENERATION,
212
+ "IBM's enterprise-grade small language model",
213
+ True,
214
+ False,
215
+ 2048,
216
+ True,
217
+ ),
218
+ HFModel(
219
+ "GLM-4.6",
220
+ "zai-org/GLM-4.6",
221
+ ModelCategory.TEXT_GENERATION,
222
+ "Multilingual conversational model",
223
+ True,
224
+ False,
225
+ 4096,
226
+ True,
227
+ ),
228
+ HFModel(
229
+ "Llama 3.1 8B Instruct",
230
+ "meta-llama/Llama-3.1-8B-Instruct",
231
+ ModelCategory.TEXT_GENERATION,
232
+ "Meta's instruction-tuned Llama model",
233
+ True,
234
+ True,
235
+ 8192,
236
+ True,
237
+ ),
238
+ HFModel(
239
+ "Tongyi DeepResearch 30B",
240
+ "Alibaba-NLP/Tongyi-DeepResearch-30B-A3B",
241
+ ModelCategory.TEXT_GENERATION,
242
+ "Alibaba's research-focused large language model",
243
+ True,
244
+ False,
245
+ 4096,
246
+ True,
247
+ ),
248
+ HFModel(
249
+ "EuroLLM 9B",
250
+ "utter-project/EuroLLM-9B",
251
+ ModelCategory.TEXT_GENERATION,
252
+ "European multilingual language model",
253
+ True,
254
+ False,
255
+ 4096,
256
+ True,
257
+ ),
258
+ ]
259
+
260
+ # Text-to-Image Models (Latest and Best)
261
+ TEXT_TO_IMAGE_MODELS = [
262
+ HFModel(
263
+ "FIBO",
264
+ "briaai/FIBO",
265
+ ModelCategory.TEXT_TO_IMAGE,
266
+ "Advanced text-to-image generation model",
267
+ True,
268
+ False,
269
+ ),
270
+ HFModel(
271
+ "FLUX.1 Dev",
272
+ "black-forest-labs/FLUX.1-dev",
273
+ ModelCategory.TEXT_TO_IMAGE,
274
+ "State-of-the-art image generation",
275
+ True,
276
+ False,
277
+ ),
278
+ HFModel(
279
+ "FLUX.1 Schnell",
280
+ "black-forest-labs/FLUX.1-schnell",
281
+ ModelCategory.TEXT_TO_IMAGE,
282
+ "Fast high-quality image generation",
283
+ True,
284
+ False,
285
+ ),
286
+ HFModel(
287
+ "Qwen Image",
288
+ "Qwen/Qwen-Image",
289
+ ModelCategory.TEXT_TO_IMAGE,
290
+ "Multilingual text-to-image model",
291
+ True,
292
+ False,
293
+ ),
294
+ HFModel(
295
+ "Stable Diffusion XL",
296
+ "stabilityai/stable-diffusion-xl-base-1.0",
297
+ ModelCategory.TEXT_TO_IMAGE,
298
+ "Popular high-resolution image generation",
299
+ True,
300
+ False,
301
+ ),
302
+ HFModel(
303
+ "Stable Diffusion 3.5 Large",
304
+ "stabilityai/stable-diffusion-3.5-large",
305
+ ModelCategory.TEXT_TO_IMAGE,
306
+ "Latest Stable Diffusion model",
307
+ True,
308
+ False,
309
+ ),
310
+ HFModel(
311
+ "HunyuanImage 3.0",
312
+ "tencent/HunyuanImage-3.0",
313
+ ModelCategory.TEXT_TO_IMAGE,
314
+ "Tencent's advanced image generation model",
315
+ True,
316
+ False,
317
+ ),
318
+ HFModel(
319
+ "Nitro-E",
320
+ "amd/Nitro-E",
321
+ ModelCategory.TEXT_TO_IMAGE,
322
+ "AMD's efficient image generation model",
323
+ True,
324
+ False,
325
+ ),
326
+ HFModel(
327
+ "Qwen Image Lightning",
328
+ "lightx2v/Qwen-Image-Lightning",
329
+ ModelCategory.TEXT_TO_IMAGE,
330
+ "Fast distilled image generation",
331
+ True,
332
+ False,
333
+ ),
334
+ ]
335
+
336
+ # Automatic Speech Recognition Models
337
+ ASR_MODELS = [
338
+ HFModel(
339
+ "Whisper Large v3",
340
+ "openai/whisper-large-v3",
341
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
342
+ "OpenAI's best multilingual speech recognition",
343
+ True,
344
+ False,
345
+ ),
346
+ HFModel(
347
+ "Whisper Large v3 Turbo",
348
+ "openai/whisper-large-v3-turbo",
349
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
350
+ "Faster version of Whisper Large v3",
351
+ True,
352
+ False,
353
+ ),
354
+ HFModel(
355
+ "Parakeet TDT 0.6B v3",
356
+ "nvidia/parakeet-tdt-0.6b-v3",
357
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
358
+ "NVIDIA's multilingual ASR model",
359
+ True,
360
+ False,
361
+ ),
362
+ HFModel(
363
+ "Canary Qwen 2.5B",
364
+ "nvidia/canary-qwen-2.5b",
365
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
366
+ "NVIDIA's advanced ASR with Qwen integration",
367
+ True,
368
+ False,
369
+ ),
370
+ HFModel(
371
+ "Canary 1B v2",
372
+ "nvidia/canary-1b-v2",
373
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
374
+ "Compact multilingual ASR model",
375
+ True,
376
+ False,
377
+ ),
378
+ HFModel(
379
+ "Whisper Small",
380
+ "openai/whisper-small",
381
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
382
+ "Lightweight multilingual ASR",
383
+ True,
384
+ False,
385
+ ),
386
+ HFModel(
387
+ "Speaker Diarization 3.1",
388
+ "pyannote/speaker-diarization-3.1",
389
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION,
390
+ "Advanced speaker identification and diarization",
391
+ True,
392
+ False,
393
+ ),
394
+ ]
395
+
396
+ # Text-to-Speech Models
397
+ TTS_MODELS = [
398
+ HFModel(
399
+ "SoulX Podcast 1.7B",
400
+ "Soul-AILab/SoulX-Podcast-1.7B",
401
+ ModelCategory.TEXT_TO_SPEECH,
402
+ "High-quality podcast-style speech synthesis",
403
+ True,
404
+ False,
405
+ ),
406
+ HFModel(
407
+ "NeuTTS Air",
408
+ "neuphonic/neutts-air",
409
+ ModelCategory.TEXT_TO_SPEECH,
410
+ "Advanced neural text-to-speech",
411
+ True,
412
+ False,
413
+ ),
414
+ HFModel(
415
+ "Kokoro 82M",
416
+ "hexgrad/Kokoro-82M",
417
+ ModelCategory.TEXT_TO_SPEECH,
418
+ "Lightweight high-quality TTS",
419
+ True,
420
+ False,
421
+ ),
422
+ HFModel(
423
+ "Kani TTS 400M EN",
424
+ "nineninesix/kani-tts-400m-en",
425
+ ModelCategory.TEXT_TO_SPEECH,
426
+ "English-focused text-to-speech model",
427
+ True,
428
+ False,
429
+ ),
430
+ HFModel(
431
+ "XTTS v2",
432
+ "coqui/XTTS-v2",
433
+ ModelCategory.TEXT_TO_SPEECH,
434
+ "Zero-shot voice cloning TTS",
435
+ True,
436
+ False,
437
+ ),
438
+ HFModel(
439
+ "Chatterbox",
440
+ "ResembleAI/chatterbox",
441
+ ModelCategory.TEXT_TO_SPEECH,
442
+ "Multilingual voice cloning",
443
+ True,
444
+ False,
445
+ ),
446
+ HFModel(
447
+ "VibeVoice 1.5B",
448
+ "microsoft/VibeVoice-1.5B",
449
+ ModelCategory.TEXT_TO_SPEECH,
450
+ "Microsoft's advanced TTS model",
451
+ True,
452
+ False,
453
+ ),
454
+ HFModel(
455
+ "OpenAudio S1 Mini",
456
+ "fishaudio/openaudio-s1-mini",
457
+ ModelCategory.TEXT_TO_SPEECH,
458
+ "Compact multilingual TTS",
459
+ True,
460
+ False,
461
+ ),
462
+ ]
463
+
464
+ # Image Classification Models
465
+ IMAGE_CLASSIFICATION_MODELS = [
466
+ HFModel(
467
+ "NSFW Image Detection",
468
+ "Falconsai/nsfw_image_detection",
469
+ ModelCategory.IMAGE_CLASSIFICATION,
470
+ "Content safety image classification",
471
+ True,
472
+ False,
473
+ ),
474
+ HFModel(
475
+ "ViT Base Patch16",
476
+ "google/vit-base-patch16-224",
477
+ ModelCategory.IMAGE_CLASSIFICATION,
478
+ "Google's Vision Transformer",
479
+ True,
480
+ False,
481
+ ),
482
+ HFModel(
483
+ "Deepfake Detection",
484
+ "dima806/deepfake_vs_real_image_detection",
485
+ ModelCategory.IMAGE_CLASSIFICATION,
486
+ "Detect AI-generated vs real images",
487
+ True,
488
+ False,
489
+ ),
490
+ HFModel(
491
+ "Facial Emotions Detection",
492
+ "dima806/facial_emotions_image_detection",
493
+ ModelCategory.IMAGE_CLASSIFICATION,
494
+ "Recognize facial emotions",
495
+ True,
496
+ False,
497
+ ),
498
+ HFModel(
499
+ "SDXL Detector",
500
+ "Organika/sdxl-detector",
501
+ ModelCategory.IMAGE_CLASSIFICATION,
502
+ "Detect Stable Diffusion XL generated images",
503
+ True,
504
+ False,
505
+ ),
506
+ HFModel(
507
+ "ViT NSFW Detector",
508
+ "AdamCodd/vit-base-nsfw-detector",
509
+ ModelCategory.IMAGE_CLASSIFICATION,
510
+ "NSFW content detection with ViT",
511
+ True,
512
+ False,
513
+ ),
514
+ HFModel(
515
+ "ResNet 101",
516
+ "microsoft/resnet-101",
517
+ ModelCategory.IMAGE_CLASSIFICATION,
518
+ "Microsoft's ResNet for classification",
519
+ True,
520
+ False,
521
+ ),
522
+ ]
523
+
524
+ # Additional Categories
525
+ FEATURE_EXTRACTION_MODELS = [
526
+ HFModel(
527
+ "Sentence Transformers All MiniLM",
528
+ "sentence-transformers/all-MiniLM-L6-v2",
529
+ ModelCategory.FEATURE_EXTRACTION,
530
+ "Lightweight sentence embeddings",
531
+ True,
532
+ False,
533
+ ),
534
+ HFModel(
535
+ "BGE Large EN",
536
+ "BAAI/bge-large-en-v1.5",
537
+ ModelCategory.FEATURE_EXTRACTION,
538
+ "High-quality English embeddings",
539
+ True,
540
+ False,
541
+ ),
542
+ HFModel(
543
+ "E5 Large v2",
544
+ "intfloat/e5-large-v2",
545
+ ModelCategory.FEATURE_EXTRACTION,
546
+ "Multilingual text embeddings",
547
+ True,
548
+ False,
549
+ ),
550
+ ]
551
+
552
+ TRANSLATION_MODELS = [
553
+ HFModel(
554
+ "M2M100 1.2B",
555
+ "facebook/m2m100_1.2B",
556
+ ModelCategory.TRANSLATION,
557
+ "Multilingual machine translation",
558
+ True,
559
+ False,
560
+ ),
561
+ HFModel(
562
+ "NLLB 200 3.3B",
563
+ "facebook/nllb-200-3.3B",
564
+ ModelCategory.TRANSLATION,
565
+ "No Language Left Behind translation",
566
+ True,
567
+ False,
568
+ ),
569
+ HFModel(
570
+ "mBART Large 50",
571
+ "facebook/mbart-large-50-many-to-many-mmt",
572
+ ModelCategory.TRANSLATION,
573
+ "Multilingual BART for translation",
574
+ True,
575
+ False,
576
+ ),
577
+ ]
578
+
579
+ SUMMARIZATION_MODELS = [
580
+ HFModel(
581
+ "PEGASUS XSum",
582
+ "google/pegasus-xsum",
583
+ ModelCategory.SUMMARIZATION,
584
+ "Abstractive summarization model",
585
+ True,
586
+ False,
587
+ ),
588
+ HFModel(
589
+ "BART Large CNN",
590
+ "facebook/bart-large-cnn",
591
+ ModelCategory.SUMMARIZATION,
592
+ "CNN/DailyMail summarization",
593
+ True,
594
+ False,
595
+ ),
596
+ HFModel(
597
+ "T5 Base",
598
+ "t5-base",
599
+ ModelCategory.SUMMARIZATION,
600
+ "Text-to-Text Transfer Transformer",
601
+ True,
602
+ False,
603
+ ),
604
+ ]
605
+
606
+ # Video Generation and Processing Models
607
+ VIDEO_GENERATION_MODELS = [
608
+ HFModel(
609
+ "Stable Video Diffusion",
610
+ "stabilityai/stable-video-diffusion-img2vid",
611
+ ModelCategory.TEXT_TO_VIDEO,
612
+ "Image-to-video generation model",
613
+ True,
614
+ False,
615
+ ),
616
+ HFModel(
617
+ "AnimateDiff",
618
+ "guoyww/animatediff",
619
+ ModelCategory.VIDEO_GENERATION,
620
+ "Text-to-video animation generation",
621
+ True,
622
+ False,
623
+ ),
624
+ HFModel(
625
+ "VideoCrafter",
626
+ "videogen/VideoCrafter",
627
+ ModelCategory.TEXT_TO_VIDEO,
628
+ "High-quality text-to-video generation",
629
+ True,
630
+ False,
631
+ ),
632
+ HFModel(
633
+ "Video ChatGPT",
634
+ "mbzuai-oryx/Video-ChatGPT-7B",
635
+ ModelCategory.VIDEO_TO_TEXT,
636
+ "Video understanding and description",
637
+ True,
638
+ False,
639
+ ),
640
+ HFModel(
641
+ "Video-BLIP",
642
+ "salesforce/video-blip-opt-2.7b",
643
+ ModelCategory.VIDEO_CLASSIFICATION,
644
+ "Video content analysis and classification",
645
+ True,
646
+ False,
647
+ ),
648
+ ]
649
+
650
+ # Code Generation and Development Models
651
+ CODE_GENERATION_MODELS = [
652
+ HFModel(
653
+ "CodeLlama 34B Instruct",
654
+ "codellama/CodeLlama-34b-Instruct-hf",
655
+ ModelCategory.CODE_GENERATION,
656
+ "Large instruction-tuned code generation model",
657
+ True,
658
+ True,
659
+ ),
660
+ HFModel(
661
+ "StarCoder2 15B",
662
+ "bigcode/starcoder2-15b",
663
+ ModelCategory.CODE_GENERATION,
664
+ "Advanced code generation and completion",
665
+ True,
666
+ False,
667
+ ),
668
+ HFModel(
669
+ "DeepSeek Coder V2",
670
+ "deepseek-ai/deepseek-coder-6.7b-instruct",
671
+ ModelCategory.CODE_GENERATION,
672
+ "Specialized coding assistant",
673
+ True,
674
+ False,
675
+ ),
676
+ HFModel(
677
+ "WizardCoder 34B",
678
+ "WizardLM/WizardCoder-Python-34B-V1.0",
679
+ ModelCategory.CODE_GENERATION,
680
+ "Python-focused code generation",
681
+ True,
682
+ False,
683
+ ),
684
+ HFModel(
685
+ "Phind CodeLlama",
686
+ "Phind/Phind-CodeLlama-34B-v2",
687
+ ModelCategory.CODE_GENERATION,
688
+ "Optimized for code explanation and debugging",
689
+ True,
690
+ False,
691
+ ),
692
+ HFModel(
693
+ "Code T5+",
694
+ "Salesforce/codet5p-770m",
695
+ ModelCategory.CODE_COMPLETION,
696
+ "Code understanding and generation",
697
+ True,
698
+ False,
699
+ ),
700
+ HFModel(
701
+ "InCoder",
702
+ "facebook/incoder-6B",
703
+ ModelCategory.CODE_COMPLETION,
704
+ "Bidirectional code generation",
705
+ True,
706
+ False,
707
+ ),
708
+ ]
709
+
710
+ # 3D and AR/VR Content Generation Models
711
+ THREE_D_MODELS = [
712
+ HFModel(
713
+ "Shap-E",
714
+ "openai/shap-e",
715
+ ModelCategory.TEXT_TO_3D,
716
+ "Text-to-3D shape generation",
717
+ True,
718
+ False,
719
+ ),
720
+ HFModel(
721
+ "Point-E",
722
+ "openai/point-e",
723
+ ModelCategory.TEXT_TO_3D,
724
+ "Text-to-3D point cloud generation",
725
+ True,
726
+ False,
727
+ ),
728
+ HFModel(
729
+ "DreamFusion",
730
+ "google/dreamfusion",
731
+ ModelCategory.IMAGE_TO_3D,
732
+ "Image-to-3D mesh generation",
733
+ True,
734
+ False,
735
+ ),
736
+ HFModel(
737
+ "Magic3D",
738
+ "nvidia/magic3d",
739
+ ModelCategory.THREE_D_GENERATION,
740
+ "High-quality 3D content creation",
741
+ True,
742
+ False,
743
+ ),
744
+ HFModel(
745
+ "GET3D",
746
+ "nvidia/get3d",
747
+ ModelCategory.MESH_GENERATION,
748
+ "3D mesh generation from text",
749
+ True,
750
+ False,
751
+ ),
752
+ ]
753
+
754
+ # Document Processing and OCR Models
755
+ DOCUMENT_PROCESSING_MODELS = [
756
+ HFModel(
757
+ "TrOCR Large",
758
+ "microsoft/trocr-large-printed",
759
+ ModelCategory.OCR,
760
+ "Transformer-based OCR for printed text",
761
+ True,
762
+ False,
763
+ ),
764
+ HFModel(
765
+ "TrOCR Handwritten",
766
+ "microsoft/trocr-large-handwritten",
767
+ ModelCategory.HANDWRITING_RECOGNITION,
768
+ "Handwritten text recognition",
769
+ True,
770
+ False,
771
+ ),
772
+ HFModel(
773
+ "LayoutLMv3",
774
+ "microsoft/layoutlmv3-large",
775
+ ModelCategory.DOCUMENT_ANALYSIS,
776
+ "Document layout analysis and understanding",
777
+ True,
778
+ False,
779
+ ),
780
+ HFModel(
781
+ "Donut",
782
+ "naver-clova-ix/donut-base",
783
+ ModelCategory.DOCUMENT_ANALYSIS,
784
+ "OCR-free document understanding",
785
+ True,
786
+ False,
787
+ ),
788
+ HFModel(
789
+ "TableTransformer",
790
+ "microsoft/table-transformer-structure-recognition",
791
+ ModelCategory.TABLE_EXTRACTION,
792
+ "Table structure recognition",
793
+ True,
794
+ False,
795
+ ),
796
+ HFModel(
797
+ "FormNet",
798
+ "microsoft/formnet",
799
+ ModelCategory.FORM_PROCESSING,
800
+ "Form understanding and processing",
801
+ True,
802
+ False,
803
+ ),
804
+ ]
805
+
806
+ # Multimodal AI Models
807
+ MULTIMODAL_MODELS = [
808
+ HFModel(
809
+ "BLIP-2",
810
+ "Salesforce/blip2-opt-2.7b",
811
+ ModelCategory.VISION_LANGUAGE,
812
+ "Vision-language understanding and generation",
813
+ True,
814
+ False,
815
+ ),
816
+ HFModel(
817
+ "InstructBLIP",
818
+ "Salesforce/instructblip-vicuna-7b",
819
+ ModelCategory.MULTIMODAL_REASONING,
820
+ "Instruction-following multimodal model",
821
+ True,
822
+ False,
823
+ ),
824
+ HFModel(
825
+ "LLaVA",
826
+ "liuhaotian/llava-v1.5-7b",
827
+ ModelCategory.VISUAL_QUESTION_ANSWERING,
828
+ "Large Language and Vision Assistant",
829
+ True,
830
+ False,
831
+ ),
832
+ HFModel(
833
+ "GPT-4V",
834
+ "openai/gpt-4-vision-preview",
835
+ ModelCategory.MULTIMODAL_CHAT,
836
+ "Advanced multimodal conversational AI",
837
+ True,
838
+ True,
839
+ ),
840
+ HFModel(
841
+ "Flamingo",
842
+ "deepmind/flamingo-9b",
843
+ ModelCategory.CROSS_MODAL_GENERATION,
844
+ "Few-shot learning for vision and language",
845
+ True,
846
+ False,
847
+ ),
848
+ ]
849
+
850
+ # Specialized AI Models
851
+ SPECIALIZED_AI_MODELS = [
852
+ HFModel(
853
+ "MusicGen",
854
+ "facebook/musicgen-medium",
855
+ ModelCategory.MUSIC_GENERATION,
856
+ "Text-to-music generation",
857
+ True,
858
+ False,
859
+ ),
860
+ HFModel(
861
+ "AudioCraft",
862
+ "facebook/audiocraft_musicgen_melody",
863
+ ModelCategory.MUSIC_GENERATION,
864
+ "Melody-conditioned music generation",
865
+ True,
866
+ False,
867
+ ),
868
+ HFModel(
869
+ "Real-ESRGAN",
870
+ "xinntao/realesrgan-x4plus",
871
+ ModelCategory.SUPER_RESOLUTION,
872
+ "Image super-resolution",
873
+ True,
874
+ False,
875
+ ),
876
+ HFModel(
877
+ "GFPGAN",
878
+ "TencentARC/GFPGAN",
879
+ ModelCategory.FACE_RESTORATION,
880
+ "Face restoration and enhancement",
881
+ True,
882
+ False,
883
+ ),
884
+ HFModel(
885
+ "LaMa",
886
+ "advimman/lama",
887
+ ModelCategory.IMAGE_INPAINTING,
888
+ "Large Mask Inpainting",
889
+ True,
890
+ False,
891
+ ),
892
+ HFModel(
893
+ "Background Remover",
894
+ "briaai/RMBG-1.4",
895
+ ModelCategory.BACKGROUND_REMOVAL,
896
+ "Automatic background removal",
897
+ True,
898
+ False,
899
+ ),
900
+ HFModel(
901
+ "Voice Cloner",
902
+ "coqui/XTTS-v2",
903
+ ModelCategory.VOICE_CLONING,
904
+ "Multilingual voice cloning",
905
+ True,
906
+ False,
907
+ ),
908
+ ]
909
+
910
+ # Creative Content Models
911
+ CREATIVE_CONTENT_MODELS = [
912
+ HFModel(
913
+ "GPT-3.5 Creative",
914
+ "openai/gpt-3.5-turbo-instruct",
915
+ ModelCategory.CREATIVE_WRITING,
916
+ "Creative writing and storytelling",
917
+ True,
918
+ True,
919
+ ),
920
+ HFModel(
921
+ "Novel AI",
922
+ "novelai/genji-python-6b",
923
+ ModelCategory.STORY_GENERATION,
924
+ "Interactive story generation",
925
+ True,
926
+ False,
927
+ ),
928
+ HFModel(
929
+ "Poet Assistant",
930
+ "gpt2-poetry",
931
+ ModelCategory.POETRY_GENERATION,
932
+ "Poetry generation and analysis",
933
+ True,
934
+ False,
935
+ ),
936
+ HFModel(
937
+ "Blog Writer",
938
+ "google/flan-t5-large",
939
+ ModelCategory.BLOG_WRITING,
940
+ "Blog content creation",
941
+ True,
942
+ False,
943
+ ),
944
+ HFModel(
945
+ "Marketing Copy AI",
946
+ "microsoft/DialoGPT-large",
947
+ ModelCategory.MARKETING_COPY,
948
+ "Marketing content generation",
949
+ True,
950
+ False,
951
+ ),
952
+ ]
953
+
954
+ # Game Development Models
955
+ GAME_DEVELOPMENT_MODELS = [
956
+ HFModel(
957
+ "Character AI",
958
+ "character-ai/character-generator",
959
+ ModelCategory.CHARACTER_GENERATION,
960
+ "Game character generation and design",
961
+ True,
962
+ False,
963
+ ),
964
+ HFModel(
965
+ "Level Designer",
966
+ "unity/level-generator",
967
+ ModelCategory.LEVEL_GENERATION,
968
+ "Game level and environment generation",
969
+ True,
970
+ False,
971
+ ),
972
+ HFModel(
973
+ "Dialogue Writer",
974
+ "bioware/dialogue-generator",
975
+ ModelCategory.DIALOGUE_GENERATION,
976
+ "Game dialogue and narrative generation",
977
+ True,
978
+ False,
979
+ ),
980
+ HFModel(
981
+ "Asset Creator",
982
+ "epic/asset-generator",
983
+ ModelCategory.GAME_ASSET_GENERATION,
984
+ "Game asset and texture generation",
985
+ True,
986
+ False,
987
+ ),
988
+ ]
989
+
990
+ # Science and Research Models
991
+ SCIENCE_RESEARCH_MODELS = [
992
+ HFModel(
993
+ "AlphaFold",
994
+ "deepmind/alphafold2",
995
+ ModelCategory.PROTEIN_FOLDING,
996
+ "Protein structure prediction",
997
+ True,
998
+ False,
999
+ ),
1000
+ HFModel(
1001
+ "ChemBERTa",
1002
+ "DeepChem/ChemBERTa-77M-MLM",
1003
+ ModelCategory.MOLECULE_GENERATION,
1004
+ "Chemical compound analysis",
1005
+ True,
1006
+ False,
1007
+ ),
1008
+ HFModel(
1009
+ "SciBERT",
1010
+ "allenai/scibert_scivocab_uncased",
1011
+ ModelCategory.SCIENTIFIC_WRITING,
1012
+ "Scientific text understanding",
1013
+ True,
1014
+ False,
1015
+ ),
1016
+ HFModel(
1017
+ "Research Assistant",
1018
+ "microsoft/specter2",
1019
+ ModelCategory.RESEARCH_ASSISTANCE,
1020
+ "Research paper analysis and recommendations",
1021
+ True,
1022
+ False,
1023
+ ),
1024
+ HFModel(
1025
+ "Data Analyst",
1026
+ "microsoft/data-copilot",
1027
+ ModelCategory.DATA_ANALYSIS,
1028
+ "Automated data analysis and insights",
1029
+ True,
1030
+ False,
1031
+ ),
1032
+ ]
1033
+
1034
+ # Business and Productivity Models
1035
+ BUSINESS_PRODUCTIVITY_MODELS = [
1036
+ HFModel(
1037
+ "Email Assistant",
1038
+ "microsoft/email-generator",
1039
+ ModelCategory.EMAIL_GENERATION,
1040
+ "Professional email composition",
1041
+ True,
1042
+ False,
1043
+ ),
1044
+ HFModel(
1045
+ "Presentation AI",
1046
+ "gamma/presentation-generator",
1047
+ ModelCategory.PRESENTATION_CREATION,
1048
+ "Automated presentation creation",
1049
+ True,
1050
+ False,
1051
+ ),
1052
+ HFModel(
1053
+ "Report Writer",
1054
+ "openai/report-generator",
1055
+ ModelCategory.REPORT_GENERATION,
1056
+ "Business report generation",
1057
+ True,
1058
+ False,
1059
+ ),
1060
+ HFModel(
1061
+ "Meeting Summarizer",
1062
+ "microsoft/meeting-summarizer",
1063
+ ModelCategory.MEETING_SUMMARIZATION,
1064
+ "Meeting notes and action items",
1065
+ True,
1066
+ False,
1067
+ ),
1068
+ HFModel(
1069
+ "Project Planner",
1070
+ "atlassian/project-ai",
1071
+ ModelCategory.PROJECT_PLANNING,
1072
+ "Project planning and management",
1073
+ True,
1074
+ False,
1075
+ ),
1076
+ ]
1077
+
1078
+ # AI Teacher Models - Best-in-Class Educational AI System
1079
+ AI_TEACHER_MODELS = [
1080
+ # Primary AI Tutoring Models
1081
+ HFModel(
1082
+ "AI Tutor Interactive",
1083
+ "microsoft/DialoGPT-medium",
1084
+ ModelCategory.AI_TUTORING,
1085
+ "Interactive AI tutor for conversational learning",
1086
+ True,
1087
+ False,
1088
+ 2048,
1089
+ True,
1090
+ ),
1091
+ HFModel(
1092
+ "Goal-Oriented Tutor",
1093
+ "microsoft/GODEL-v1_1-large-seq2seq",
1094
+ ModelCategory.AI_TUTORING,
1095
+ "Goal-oriented conversational AI for personalized tutoring",
1096
+ True,
1097
+ False,
1098
+ 2048,
1099
+ True,
1100
+ ),
1101
+ HFModel(
1102
+ "Code Instructor AI",
1103
+ "microsoft/codebert-base",
1104
+ ModelCategory.CODING_INSTRUCTION,
1105
+ "AI coding instructor for programming education",
1106
+ True,
1107
+ False,
1108
+ 1024,
1109
+ False,
1110
+ ),
1111
+ HFModel(
1112
+ "deepmind/flamingo-base",
1113
+ "ADAPTIVE_LEARNING",
1114
+ ModelCategory.ADAPTIVE_LEARNING,
1115
+ "Multimodal AI for adaptive learning experiences",
1116
+ True,
1117
+ False,
1118
+ 1024,
1119
+ True,
1120
+ ),
1121
+ # Educational Content Generation
1122
+ HFModel(
1123
+ "gpt2-medium",
1124
+ "EDUCATIONAL_CONTENT",
1125
+ ModelCategory.EDUCATIONAL_CONTENT,
1126
+ "Educational content generation for curriculum development",
1127
+ True,
1128
+ False,
1129
+ 1024,
1130
+ True,
1131
+ ),
1132
+ HFModel(
1133
+ "facebook/bart-large-cnn",
1134
+ "LESSON_PLANNING",
1135
+ ModelCategory.LESSON_PLANNING,
1136
+ "Lesson plan generation and educational summarization",
1137
+ True,
1138
+ False,
1139
+ 1024,
1140
+ True,
1141
+ ),
1142
+ HFModel(
1143
+ "microsoft/prophetnet-large-uncased",
1144
+ "STUDY_GUIDE_CREATION",
1145
+ ModelCategory.STUDY_GUIDE_CREATION,
1146
+ "Study guide and learning material generation",
1147
+ True,
1148
+ False,
1149
+ 1024,
1150
+ True,
1151
+ ),
1152
+ HFModel(
1153
+ "bigscience/bloom-560m",
1154
+ "EDUCATIONAL_CONTENT",
1155
+ ModelCategory.EDUCATIONAL_CONTENT,
1156
+ "Multilingual educational content for global learning",
1157
+ True,
1158
+ False,
1159
+ 1024,
1160
+ True,
1161
+ ),
1162
+ # Subject-Specific Teaching Models
1163
+ HFModel(
1164
+ "microsoft/codebert-base",
1165
+ "CODING_INSTRUCTION",
1166
+ ModelCategory.CODING_INSTRUCTION,
1167
+ "Programming education and code explanation",
1168
+ True,
1169
+ False,
1170
+ 1024,
1171
+ True,
1172
+ ),
1173
+ HFModel(
1174
+ "allenai/scibert_scivocab_uncased",
1175
+ "SCIENCE_TUTORING",
1176
+ ModelCategory.SCIENCE_TUTORING,
1177
+ "Science education and scientific concept explanation",
1178
+ True,
1179
+ False,
1180
+ 1024,
1181
+ True,
1182
+ ),
1183
+ HFModel(
1184
+ "google/flan-t5-base",
1185
+ "SUBJECT_TEACHING",
1186
+ ModelCategory.SUBJECT_TEACHING,
1187
+ "Multi-subject teaching AI with instruction following",
1188
+ True,
1189
+ False,
1190
+ 1024,
1191
+ True,
1192
+ ),
1193
+ HFModel(
1194
+ "microsoft/unixcoder-base",
1195
+ "CODING_INSTRUCTION",
1196
+ ModelCategory.CODING_INSTRUCTION,
1197
+ "Advanced programming instruction and debugging help",
1198
+ True,
1199
+ False,
1200
+ 1024,
1201
+ True,
1202
+ ),
1203
+ # Math and STEM Education
1204
+ HFModel(
1205
+ "microsoft/DialoGPT-small",
1206
+ "MATH_TUTORING",
1207
+ ModelCategory.MATH_TUTORING,
1208
+ "Interactive math tutoring and problem solving",
1209
+ True,
1210
+ False,
1211
+ 1024,
1212
+ True,
1213
+ ),
1214
+ HFModel(
1215
+ "facebook/galactica-125m",
1216
+ "SCIENCE_TUTORING",
1217
+ ModelCategory.SCIENCE_TUTORING,
1218
+ "Scientific knowledge and research education",
1219
+ True,
1220
+ False,
1221
+ 1024,
1222
+ True,
1223
+ ),
1224
+ HFModel(
1225
+ "microsoft/graphcodebert-base",
1226
+ "CODING_INSTRUCTION",
1227
+ ModelCategory.CODING_INSTRUCTION,
1228
+ "Code structure and algorithm education",
1229
+ True,
1230
+ False,
1231
+ 1024,
1232
+ True,
1233
+ ),
1234
+ HFModel(
1235
+ "deepmind/mathematical-reasoning",
1236
+ "MATH_TUTORING",
1237
+ ModelCategory.MATH_TUTORING,
1238
+ "Mathematical reasoning and proof assistance",
1239
+ True,
1240
+ False,
1241
+ 1024,
1242
+ True,
1243
+ ),
1244
+ # Language and Literature Education
1245
+ HFModel(
1246
+ "microsoft/prophetnet-large-uncased-cnndm",
1247
+ "LANGUAGE_TUTORING",
1248
+ ModelCategory.LANGUAGE_TUTORING,
1249
+ "Language learning and literature analysis",
1250
+ True,
1251
+ False,
1252
+ 1024,
1253
+ True,
1254
+ ),
1255
+ HFModel(
1256
+ "facebook/mbart-large-50-many-to-many-mmt",
1257
+ "LANGUAGE_TUTORING",
1258
+ ModelCategory.LANGUAGE_TUTORING,
1259
+ "Multilingual language education and translation",
1260
+ True,
1261
+ False,
1262
+ 1024,
1263
+ True,
1264
+ ),
1265
+ HFModel(
1266
+ "google/electra-base-discriminator",
1267
+ "LANGUAGE_TUTORING",
1268
+ ModelCategory.LANGUAGE_TUTORING,
1269
+ "Language comprehension and grammar instruction",
1270
+ True,
1271
+ False,
1272
+ 1024,
1273
+ True,
1274
+ ),
1275
+ # Assessment and Testing
1276
+ HFModel(
1277
+ "microsoft/DialoGPT-large",
1278
+ "QUIZ_GENERATION",
1279
+ ModelCategory.QUIZ_GENERATION,
1280
+ "Interactive quiz and assessment generation",
1281
+ True,
1282
+ False,
1283
+ 1024,
1284
+ True,
1285
+ ),
1286
+ HFModel(
1287
+ "facebook/bart-large",
1288
+ "LEARNING_ASSESSMENT",
1289
+ ModelCategory.LEARNING_ASSESSMENT,
1290
+ "Learning progress assessment and feedback",
1291
+ True,
1292
+ False,
1293
+ 1024,
1294
+ True,
1295
+ ),
1296
+ HFModel(
1297
+ "google/t5-base",
1298
+ "QUIZ_GENERATION",
1299
+ ModelCategory.QUIZ_GENERATION,
1300
+ "Question generation for educational assessment",
1301
+ True,
1302
+ False,
1303
+ 1024,
1304
+ True,
1305
+ ),
1306
+ HFModel(
1307
+ "microsoft/unilm-base-cased",
1308
+ "EXAM_PREPARATION",
1309
+ ModelCategory.EXAM_PREPARATION,
1310
+ "Exam preparation and practice test generation",
1311
+ True,
1312
+ False,
1313
+ 1024,
1314
+ True,
1315
+ ),
1316
+ # Personalized Learning
1317
+ HFModel(
1318
+ "huggingface/distilbert-base-uncased",
1319
+ "PERSONALIZED_LEARNING",
1320
+ ModelCategory.PERSONALIZED_LEARNING,
1321
+ "Personalized learning path recommendation",
1322
+ True,
1323
+ False,
1324
+ 1024,
1325
+ True,
1326
+ ),
1327
+ HFModel(
1328
+ "microsoft/layoutlm-base-uncased",
1329
+ "LEARNING_ANALYTICS",
1330
+ ModelCategory.LEARNING_ANALYTICS,
1331
+ "Educational document analysis and insights",
1332
+ True,
1333
+ False,
1334
+ 1024,
1335
+ True,
1336
+ ),
1337
+ HFModel(
1338
+ "facebook/opt-125m",
1339
+ "ADAPTIVE_LEARNING",
1340
+ ModelCategory.ADAPTIVE_LEARNING,
1341
+ "Adaptive learning system with dynamic content",
1342
+ True,
1343
+ False,
1344
+ 1024,
1345
+ True,
1346
+ ),
1347
+ # Concept Explanation and Understanding
1348
+ HFModel(
1349
+ "microsoft/deberta-base",
1350
+ "CONCEPT_EXPLANATION",
1351
+ ModelCategory.CONCEPT_EXPLANATION,
1352
+ "Clear concept explanation and knowledge breakdown",
1353
+ True,
1354
+ False,
1355
+ 1024,
1356
+ True,
1357
+ ),
1358
+ HFModel(
1359
+ "google/pegasus-xsum",
1360
+ "CONCEPT_EXPLANATION",
1361
+ ModelCategory.CONCEPT_EXPLANATION,
1362
+ "Concept summarization and explanation",
1363
+ True,
1364
+ False,
1365
+ 1024,
1366
+ True,
1367
+ ),
1368
+ HFModel(
1369
+ "facebook/bart-base",
1370
+ "CONCEPT_EXPLANATION",
1371
+ ModelCategory.CONCEPT_EXPLANATION,
1372
+ "Interactive concept teaching and clarification",
1373
+ True,
1374
+ False,
1375
+ 1024,
1376
+ True,
1377
+ ),
1378
+ # Homework and Study Assistance
1379
+ HFModel(
1380
+ "microsoft/codebert-base-mlm",
1381
+ "HOMEWORK_ASSISTANCE",
1382
+ ModelCategory.HOMEWORK_ASSISTANCE,
1383
+ "Programming homework help and debugging",
1384
+ True,
1385
+ False,
1386
+ 1024,
1387
+ True,
1388
+ ),
1389
+ HFModel(
1390
+ "google/flan-t5-small",
1391
+ "HOMEWORK_ASSISTANCE",
1392
+ ModelCategory.HOMEWORK_ASSISTANCE,
1393
+ "General homework assistance across subjects",
1394
+ True,
1395
+ False,
1396
+ 1024,
1397
+ True,
1398
+ ),
1399
+ HFModel(
1400
+ "facebook/mbart-large-cc25",
1401
+ "HOMEWORK_ASSISTANCE",
1402
+ ModelCategory.HOMEWORK_ASSISTANCE,
1403
+ "Multilingual homework support and explanation",
1404
+ True,
1405
+ False,
1406
+ 1024,
1407
+ True,
1408
+ ),
1409
+ # Curriculum Design and Planning
1410
+ HFModel(
1411
+ "microsoft/prophetnet-base-uncased",
1412
+ "CURRICULUM_DESIGN",
1413
+ ModelCategory.CURRICULUM_DESIGN,
1414
+ "Curriculum planning and educational structure design",
1415
+ True,
1416
+ False,
1417
+ 1024,
1418
+ True,
1419
+ ),
1420
+ HFModel(
1421
+ "google/t5-small",
1422
+ "LESSON_PLANNING",
1423
+ ModelCategory.LESSON_PLANNING,
1424
+ "Detailed lesson planning and activity design",
1425
+ True,
1426
+ False,
1427
+ 1024,
1428
+ True,
1429
+ ),
1430
+ HFModel(
1431
+ "facebook/bart-large-xsum",
1432
+ "CURRICULUM_DESIGN",
1433
+ ModelCategory.CURRICULUM_DESIGN,
1434
+ "Educational program summarization and design",
1435
+ True,
1436
+ False,
1437
+ 1024,
1438
+ True,
1439
+ ),
1440
+ # Educational Games and Interactive Learning
1441
+ HFModel(
1442
+ "microsoft/DialoGPT-base",
1443
+ "EDUCATIONAL_GAMES",
1444
+ ModelCategory.EDUCATIONAL_GAMES,
1445
+ "Interactive educational games and learning activities",
1446
+ True,
1447
+ False,
1448
+ 1024,
1449
+ True,
1450
+ ),
1451
+ HFModel(
1452
+ "huggingface/bert-base-uncased",
1453
+ "EDUCATIONAL_GAMES",
1454
+ ModelCategory.EDUCATIONAL_GAMES,
1455
+ "Educational quiz games and interactive learning",
1456
+ True,
1457
+ False,
1458
+ 1024,
1459
+ True,
1460
+ ),
1461
+ # History and Social Studies
1462
+ HFModel(
1463
+ "microsoft/deberta-large",
1464
+ "HISTORY_TUTORING",
1465
+ ModelCategory.HISTORY_TUTORING,
1466
+ "Historical analysis and social studies education",
1467
+ True,
1468
+ False,
1469
+ 1024,
1470
+ True,
1471
+ ),
1472
+ HFModel(
1473
+ "facebook/opt-350m",
1474
+ "HISTORY_TUTORING",
1475
+ ModelCategory.HISTORY_TUTORING,
1476
+ "Interactive history lessons and timeline explanation",
1477
+ True,
1478
+ False,
1479
+ 1024,
1480
+ True,
1481
+ ),
1482
+ # Advanced Educational Features
1483
+ HFModel(
1484
+ "microsoft/unilm-large-cased",
1485
+ "LEARNING_ANALYTICS",
1486
+ ModelCategory.LEARNING_ANALYTICS,
1487
+ "Advanced learning analytics and progress tracking",
1488
+ True,
1489
+ False,
1490
+ 1024,
1491
+ True,
1492
+ ),
1493
+ HFModel(
1494
+ "google/electra-large-discriminator",
1495
+ "PERSONALIZED_LEARNING",
1496
+ ModelCategory.PERSONALIZED_LEARNING,
1497
+ "Advanced personalized learning with AI adaptation",
1498
+ True,
1499
+ False,
1500
+ 1024,
1501
+ True,
1502
+ ),
1503
+ HFModel(
1504
+ "facebook/mbart-large-50",
1505
+ "ADAPTIVE_LEARNING",
1506
+ ModelCategory.ADAPTIVE_LEARNING,
1507
+ "Multilingual adaptive learning system",
1508
+ True,
1509
+ False,
1510
+ 1024,
1511
+ True,
1512
+ ),
1513
+ ]
1514
+
1515
+
1516
+ class HuggingFaceInference:
1517
+ """Hugging Face Inference API integration"""
1518
+
1519
+ def __init__(
1520
+ self,
1521
+ api_token: str,
1522
+ base_url: str = "https://api-inference.huggingface.co/models/",
1523
+ ):
1524
+ self.api_token = api_token
1525
+ self.base_url = base_url
1526
+ self.session = None
1527
+
1528
+ async def __aenter__(self):
1529
+ self.session = aiohttp.ClientSession(
1530
+ headers={"Authorization": f"Bearer {self.api_token}"},
1531
+ timeout=aiohttp.ClientTimeout(total=300), # 5 minutes timeout
1532
+ )
1533
+ return self
1534
+
1535
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
1536
+ if self.session:
1537
+ await self.session.close()
1538
+
1539
+ async def text_generation(
1540
+ self,
1541
+ model_id: str,
1542
+ prompt: str,
1543
+ max_tokens: int = 100,
1544
+ temperature: float = 0.7,
1545
+ stream: bool = False,
1546
+ **kwargs,
1547
+ ) -> Dict[str, Any]:
1548
+ """Generate text using a text generation model"""
1549
+ payload = {
1550
+ "inputs": prompt,
1551
+ "parameters": {
1552
+ "max_new_tokens": max_tokens,
1553
+ "temperature": temperature,
1554
+ "do_sample": True,
1555
+ **kwargs,
1556
+ },
1557
+ "options": {"use_cache": False},
1558
+ }
1559
+
1560
+ if stream:
1561
+ return await self._stream_request(model_id, payload)
1562
+ else:
1563
+ return await self._request(model_id, payload)
1564
+
1565
+ async def text_to_image(
1566
+ self,
1567
+ model_id: str,
1568
+ prompt: str,
1569
+ negative_prompt: Optional[str] = None,
1570
+ **kwargs,
1571
+ ) -> bytes:
1572
+ """Generate image from text prompt"""
1573
+ payload = {
1574
+ "inputs": prompt,
1575
+ "parameters": {
1576
+ **({"negative_prompt": negative_prompt} if negative_prompt else {}),
1577
+ **kwargs,
1578
+ },
1579
+ }
1580
+
1581
+ response = await self._request(model_id, payload, expect_json=False)
1582
+ return response
1583
+
1584
+ async def automatic_speech_recognition(
1585
+ self, model_id: str, audio_data: bytes, **kwargs
1586
+ ) -> Dict[str, Any]:
1587
+ """Transcribe audio to text"""
1588
+ # Convert audio bytes to base64 for API
1589
+ audio_b64 = base64.b64encode(audio_data).decode()
1590
+
1591
+ payload = {"inputs": audio_b64, "parameters": kwargs}
1592
+
1593
+ return await self._request(model_id, payload)
1594
+
1595
+ async def text_to_speech(self, model_id: str, text: str, **kwargs) -> bytes:
1596
+ """Convert text to speech audio"""
1597
+ payload = {"inputs": text, "parameters": kwargs}
1598
+
1599
+ response = await self._request(model_id, payload, expect_json=False)
1600
+ return response
1601
+
1602
+ async def image_classification(
1603
+ self, model_id: str, image_data: bytes, **kwargs
1604
+ ) -> Dict[str, Any]:
1605
+ """Classify images"""
1606
+ # Convert image to base64
1607
+ image_b64 = base64.b64encode(image_data).decode()
1608
+
1609
+ payload = {"inputs": image_b64, "parameters": kwargs}
1610
+
1611
+ return await self._request(model_id, payload)
1612
+
1613
+ async def feature_extraction(
1614
+ self, model_id: str, texts: Union[str, List[str]], **kwargs
1615
+ ) -> Dict[str, Any]:
1616
+ """Extract embeddings from text"""
1617
+ payload = {"inputs": texts, "parameters": kwargs}
1618
+
1619
+ return await self._request(model_id, payload)
1620
+
1621
+ async def translation(
1622
+ self,
1623
+ model_id: str,
1624
+ text: str,
1625
+ src_lang: Optional[str] = None,
1626
+ tgt_lang: Optional[str] = None,
1627
+ **kwargs,
1628
+ ) -> Dict[str, Any]:
1629
+ """Translate text between languages"""
1630
+ payload = {
1631
+ "inputs": text,
1632
+ "parameters": {
1633
+ **({"src_lang": src_lang} if src_lang else {}),
1634
+ **({"tgt_lang": tgt_lang} if tgt_lang else {}),
1635
+ **kwargs,
1636
+ },
1637
+ }
1638
+
1639
+ return await self._request(model_id, payload)
1640
+
1641
+ async def summarization(
1642
+ self,
1643
+ model_id: str,
1644
+ text: str,
1645
+ max_length: int = 150,
1646
+ min_length: int = 30,
1647
+ **kwargs,
1648
+ ) -> Dict[str, Any]:
1649
+ """Summarize text"""
1650
+ payload = {
1651
+ "inputs": text,
1652
+ "parameters": {
1653
+ "max_length": max_length,
1654
+ "min_length": min_length,
1655
+ **kwargs,
1656
+ },
1657
+ }
1658
+
1659
+ return await self._request(model_id, payload)
1660
+
1661
+ async def question_answering(
1662
+ self, model_id: str, question: str, context: str, **kwargs
1663
+ ) -> Dict[str, Any]:
1664
+ """Answer questions based on context"""
1665
+ payload = {
1666
+ "inputs": {"question": question, "context": context},
1667
+ "parameters": kwargs,
1668
+ }
1669
+
1670
+ return await self._request(model_id, payload)
1671
+
1672
+ async def zero_shot_classification(
1673
+ self, model_id: str, text: str, candidate_labels: List[str], **kwargs
1674
+ ) -> Dict[str, Any]:
1675
+ """Classify text without training data"""
1676
+ payload = {
1677
+ "inputs": text,
1678
+ "parameters": {"candidate_labels": candidate_labels, **kwargs},
1679
+ }
1680
+
1681
+ return await self._request(model_id, payload)
1682
+
1683
+ async def conversational(
1684
+ self,
1685
+ model_id: str,
1686
+ text: str,
1687
+ conversation_history: Optional[List[Dict[str, str]]] = None,
1688
+ **kwargs,
1689
+ ) -> Dict[str, Any]:
1690
+ """Have a conversation with a model"""
1691
+ payload = {
1692
+ "inputs": {
1693
+ "text": text,
1694
+ **(
1695
+ {
1696
+ "past_user_inputs": [
1697
+ h["user"] for h in conversation_history if "user" in h
1698
+ ]
1699
+ }
1700
+ if conversation_history
1701
+ else {}
1702
+ ),
1703
+ **(
1704
+ {
1705
+ "generated_responses": [
1706
+ h["bot"] for h in conversation_history if "bot" in h
1707
+ ]
1708
+ }
1709
+ if conversation_history
1710
+ else {}
1711
+ ),
1712
+ },
1713
+ "parameters": kwargs,
1714
+ }
1715
+
1716
+ return await self._request(model_id, payload)
1717
+
1718
+ async def _request(
1719
+ self, model_id: str, payload: Dict[str, Any], expect_json: bool = True
1720
+ ) -> Union[Dict[str, Any], bytes]:
1721
+ """Make HTTP request to Hugging Face API"""
1722
+ url = f"{self.base_url}{model_id}"
1723
+
1724
+ try:
1725
+ async with self.session.post(url, json=payload) as response:
1726
+ if response.status == 200:
1727
+ if expect_json:
1728
+ return await response.json()
1729
+ else:
1730
+ return await response.read()
1731
+ elif response.status == 503:
1732
+ # Model is loading, wait and retry
1733
+ error_info = await response.json()
1734
+ estimated_time = error_info.get("estimated_time", 30)
1735
+ logger.info(
1736
+ f"Model {model_id} is loading, waiting {estimated_time}s"
1737
+ )
1738
+ await asyncio.sleep(min(estimated_time, 60)) # Cap at 60 seconds
1739
+ return await self._request(model_id, payload, expect_json)
1740
+ else:
1741
+ error_text = await response.text()
1742
+ raise Exception(
1743
+ f"API request failed with status {response.status}: {error_text}"
1744
+ )
1745
+
1746
+ except Exception as e:
1747
+ logger.error(f"Error calling Hugging Face API for {model_id}: {e}")
1748
+ raise
1749
+
1750
+ async def _stream_request(self, model_id: str, payload: Dict[str, Any]):
1751
+ """Stream response from Hugging Face API"""
1752
+ url = f"{self.base_url}{model_id}"
1753
+ payload["stream"] = True
1754
+
1755
+ try:
1756
+ async with self.session.post(url, json=payload) as response:
1757
+ if response.status == 200:
1758
+ async for chunk in response.content:
1759
+ if chunk:
1760
+ yield chunk.decode("utf-8")
1761
+ else:
1762
+ error_text = await response.text()
1763
+ raise Exception(
1764
+ f"Streaming request failed with status {response.status}: {error_text}"
1765
+ )
1766
+
1767
+ except Exception as e:
1768
+ logger.error(f"Error streaming from Hugging Face API for {model_id}: {e}")
1769
+ raise
1770
+
1771
+ # New methods for expanded model categories
1772
+
1773
+ async def text_to_video(
1774
+ self, model_id: str, prompt: str, **kwargs
1775
+ ) -> Dict[str, Any]:
1776
+ """Generate video from text prompt"""
1777
+ payload = {
1778
+ "inputs": prompt,
1779
+ "parameters": {
1780
+ "duration": kwargs.get("duration", 5),
1781
+ "fps": kwargs.get("fps", 24),
1782
+ "width": kwargs.get("width", 512),
1783
+ "height": kwargs.get("height", 512),
1784
+ **kwargs,
1785
+ },
1786
+ }
1787
+ return await self._request(model_id, payload)
1788
+
1789
+ async def video_to_text(
1790
+ self, model_id: str, video_data: bytes, **kwargs
1791
+ ) -> Dict[str, Any]:
1792
+ """Analyze video and generate text description"""
1793
+ video_b64 = base64.b64encode(video_data).decode()
1794
+ payload = {
1795
+ "inputs": {"video": video_b64},
1796
+ "parameters": kwargs,
1797
+ }
1798
+ return await self._request(model_id, payload)
1799
+
1800
+ async def code_generation(
1801
+ self, model_id: str, prompt: str, **kwargs
1802
+ ) -> Dict[str, Any]:
1803
+ """Generate code from natural language prompt"""
1804
+ payload = {
1805
+ "inputs": prompt,
1806
+ "parameters": {
1807
+ "max_length": kwargs.get("max_length", 500),
1808
+ "temperature": kwargs.get("temperature", 0.2),
1809
+ "language": kwargs.get("language", "python"),
1810
+ **kwargs,
1811
+ },
1812
+ }
1813
+ return await self._request(model_id, payload)
1814
+
1815
+ async def code_completion(
1816
+ self, model_id: str, code: str, **kwargs
1817
+ ) -> Dict[str, Any]:
1818
+ """Complete partial code"""
1819
+ payload = {
1820
+ "inputs": code,
1821
+ "parameters": {
1822
+ "max_length": kwargs.get("max_length", 100),
1823
+ "temperature": kwargs.get("temperature", 0.1),
1824
+ **kwargs,
1825
+ },
1826
+ }
1827
+ return await self._request(model_id, payload)
1828
+
1829
+ async def text_to_3d(self, model_id: str, prompt: str, **kwargs) -> Dict[str, Any]:
1830
+ """Generate 3D model from text description"""
1831
+ payload = {
1832
+ "inputs": prompt,
1833
+ "parameters": {
1834
+ "resolution": kwargs.get("resolution", 64),
1835
+ "format": kwargs.get("format", "obj"),
1836
+ **kwargs,
1837
+ },
1838
+ }
1839
+ return await self._request(model_id, payload)
1840
+
1841
+ async def image_to_3d(
1842
+ self, model_id: str, image_data: bytes, **kwargs
1843
+ ) -> Dict[str, Any]:
1844
+ """Generate 3D model from image"""
1845
+ image_b64 = base64.b64encode(image_data).decode()
1846
+ payload = {
1847
+ "inputs": {"image": image_b64},
1848
+ "parameters": kwargs,
1849
+ }
1850
+ return await self._request(model_id, payload)
1851
+
1852
+ async def ocr(self, model_id: str, image_data: bytes, **kwargs) -> Dict[str, Any]:
1853
+ """Perform optical character recognition on image"""
1854
+ image_b64 = base64.b64encode(image_data).decode()
1855
+ payload = {
1856
+ "inputs": {"image": image_b64},
1857
+ "parameters": {"language": kwargs.get("language", "en"), **kwargs},
1858
+ }
1859
+ return await self._request(model_id, payload)
1860
+
1861
+ async def document_analysis(
1862
+ self, model_id: str, document_data: bytes, **kwargs
1863
+ ) -> Dict[str, Any]:
1864
+ """Analyze document structure and content"""
1865
+ doc_b64 = base64.b64encode(document_data).decode()
1866
+ payload = {
1867
+ "inputs": {"document": doc_b64},
1868
+ "parameters": kwargs,
1869
+ }
1870
+ return await self._request(model_id, payload)
1871
+
1872
+ async def vision_language(
1873
+ self, model_id: str, image_data: bytes, text: str, **kwargs
1874
+ ) -> Dict[str, Any]:
1875
+ """Process image and text together"""
1876
+ image_b64 = base64.b64encode(image_data).decode()
1877
+ payload = {
1878
+ "inputs": {"image": image_b64, "text": text},
1879
+ "parameters": kwargs,
1880
+ }
1881
+ return await self._request(model_id, payload)
1882
+
1883
+ async def multimodal_reasoning(
1884
+ self, model_id: str, inputs: Dict[str, Any], **kwargs
1885
+ ) -> Dict[str, Any]:
1886
+ """Perform reasoning across multiple modalities"""
1887
+ payload = {
1888
+ "inputs": inputs,
1889
+ "parameters": kwargs,
1890
+ }
1891
+ return await self._request(model_id, payload)
1892
+
1893
+ async def music_generation(
1894
+ self, model_id: str, prompt: str, **kwargs
1895
+ ) -> Dict[str, Any]:
1896
+ """Generate music from text prompt"""
1897
+ payload = {
1898
+ "inputs": prompt,
1899
+ "parameters": {
1900
+ "duration": kwargs.get("duration", 30),
1901
+ "bpm": kwargs.get("bpm", 120),
1902
+ "genre": kwargs.get("genre", "electronic"),
1903
+ **kwargs,
1904
+ },
1905
+ }
1906
+ return await self._request(model_id, payload)
1907
+
1908
+ async def voice_cloning(
1909
+ self, model_id: str, text: str, voice_sample: bytes, **kwargs
1910
+ ) -> bytes:
1911
+ """Clone voice and synthesize speech"""
1912
+ voice_b64 = base64.b64encode(voice_sample).decode()
1913
+ payload = {
1914
+ "inputs": {"text": text, "voice_sample": voice_b64},
1915
+ "parameters": kwargs,
1916
+ }
1917
+ return await self._request(model_id, payload, expect_json=False)
1918
+
1919
+ async def super_resolution(
1920
+ self, model_id: str, image_data: bytes, **kwargs
1921
+ ) -> bytes:
1922
+ """Enhance image resolution"""
1923
+ image_b64 = base64.b64encode(image_data).decode()
1924
+ payload = {
1925
+ "inputs": {"image": image_b64},
1926
+ "parameters": {"scale_factor": kwargs.get("scale_factor", 4), **kwargs},
1927
+ }
1928
+ return await self._request(model_id, payload, expect_json=False)
1929
+
1930
+ async def background_removal(
1931
+ self, model_id: str, image_data: bytes, **kwargs
1932
+ ) -> bytes:
1933
+ """Remove background from image"""
1934
+ image_b64 = base64.b64encode(image_data).decode()
1935
+ payload = {
1936
+ "inputs": {"image": image_b64},
1937
+ "parameters": kwargs,
1938
+ }
1939
+ return await self._request(model_id, payload, expect_json=False)
1940
+
1941
+ async def creative_writing(
1942
+ self, model_id: str, prompt: str, **kwargs
1943
+ ) -> Dict[str, Any]:
1944
+ """Generate creative content"""
1945
+ payload = {
1946
+ "inputs": prompt,
1947
+ "parameters": {
1948
+ "max_length": kwargs.get("max_length", 1000),
1949
+ "creativity": kwargs.get("creativity", 0.8),
1950
+ "genre": kwargs.get("genre", "general"),
1951
+ **kwargs,
1952
+ },
1953
+ }
1954
+ return await self._request(model_id, payload)
1955
+
1956
+ async def business_document(
1957
+ self, model_id: str, document_type: str, context: str, **kwargs
1958
+ ) -> Dict[str, Any]:
1959
+ """Generate business documents"""
1960
+ payload = {
1961
+ "inputs": f"Generate {document_type}: {context}",
1962
+ "parameters": {
1963
+ "format": kwargs.get("format", "professional"),
1964
+ "length": kwargs.get("length", "medium"),
1965
+ **kwargs,
1966
+ },
1967
+ }
1968
+ return await self._request(model_id, payload)
1969
+
1970
+
1971
+ class HuggingFaceModelManager:
1972
+ """Manager for all Hugging Face model operations"""
1973
+
1974
+ def __init__(self, api_token: str):
1975
+ self.api_token = api_token
1976
+ self.models = HuggingFaceModels()
1977
+
1978
+ def get_models_by_category(self, category: ModelCategory) -> List[HFModel]:
1979
+ """Get all models for a specific category"""
1980
+ all_models = []
1981
+
1982
+ if category == ModelCategory.TEXT_GENERATION:
1983
+ all_models = self.models.TEXT_GENERATION_MODELS
1984
+ elif category == ModelCategory.TEXT_TO_IMAGE:
1985
+ all_models = self.models.TEXT_TO_IMAGE_MODELS
1986
+ elif category == ModelCategory.AUTOMATIC_SPEECH_RECOGNITION:
1987
+ all_models = self.models.ASR_MODELS
1988
+ elif category == ModelCategory.TEXT_TO_SPEECH:
1989
+ all_models = self.models.TTS_MODELS
1990
+ elif category == ModelCategory.IMAGE_CLASSIFICATION:
1991
+ all_models = self.models.IMAGE_CLASSIFICATION_MODELS
1992
+ elif category == ModelCategory.FEATURE_EXTRACTION:
1993
+ all_models = self.models.FEATURE_EXTRACTION_MODELS
1994
+ elif category == ModelCategory.TRANSLATION:
1995
+ all_models = self.models.TRANSLATION_MODELS
1996
+ elif category == ModelCategory.SUMMARIZATION:
1997
+ all_models = self.models.SUMMARIZATION_MODELS
1998
+
1999
+ return all_models
2000
+
2001
+ def get_all_models(self) -> Dict[ModelCategory, List[HFModel]]:
2002
+ """Get all available models organized by category"""
2003
+ return {
2004
+ # Core AI categories
2005
+ ModelCategory.TEXT_GENERATION: self.models.TEXT_GENERATION_MODELS,
2006
+ ModelCategory.TEXT_TO_IMAGE: self.models.TEXT_TO_IMAGE_MODELS,
2007
+ ModelCategory.AUTOMATIC_SPEECH_RECOGNITION: self.models.ASR_MODELS,
2008
+ ModelCategory.TEXT_TO_SPEECH: self.models.TTS_MODELS,
2009
+ ModelCategory.IMAGE_CLASSIFICATION: self.models.IMAGE_CLASSIFICATION_MODELS,
2010
+ ModelCategory.FEATURE_EXTRACTION: self.models.FEATURE_EXTRACTION_MODELS,
2011
+ ModelCategory.TRANSLATION: self.models.TRANSLATION_MODELS,
2012
+ ModelCategory.SUMMARIZATION: self.models.SUMMARIZATION_MODELS,
2013
+ # Video and Motion
2014
+ ModelCategory.TEXT_TO_VIDEO: self.models.VIDEO_GENERATION_MODELS,
2015
+ ModelCategory.VIDEO_GENERATION: self.models.VIDEO_GENERATION_MODELS,
2016
+ ModelCategory.VIDEO_TO_TEXT: self.models.VIDEO_GENERATION_MODELS,
2017
+ ModelCategory.VIDEO_CLASSIFICATION: self.models.VIDEO_GENERATION_MODELS,
2018
+ # Code and Development
2019
+ ModelCategory.CODE_GENERATION: self.models.CODE_GENERATION_MODELS,
2020
+ ModelCategory.CODE_COMPLETION: self.models.CODE_GENERATION_MODELS,
2021
+ ModelCategory.CODE_EXPLANATION: self.models.CODE_GENERATION_MODELS,
2022
+ ModelCategory.APP_GENERATION: self.models.CODE_GENERATION_MODELS,
2023
+ # 3D and AR/VR
2024
+ ModelCategory.TEXT_TO_3D: self.models.THREE_D_MODELS,
2025
+ ModelCategory.IMAGE_TO_3D: self.models.THREE_D_MODELS,
2026
+ ModelCategory.THREE_D_GENERATION: self.models.THREE_D_MODELS,
2027
+ ModelCategory.MESH_GENERATION: self.models.THREE_D_MODELS,
2028
+ # Document Processing
2029
+ ModelCategory.OCR: self.models.DOCUMENT_PROCESSING_MODELS,
2030
+ ModelCategory.DOCUMENT_ANALYSIS: self.models.DOCUMENT_PROCESSING_MODELS,
2031
+ ModelCategory.HANDWRITING_RECOGNITION: self.models.DOCUMENT_PROCESSING_MODELS,
2032
+ ModelCategory.TABLE_EXTRACTION: self.models.DOCUMENT_PROCESSING_MODELS,
2033
+ ModelCategory.FORM_PROCESSING: self.models.DOCUMENT_PROCESSING_MODELS,
2034
+ # Multimodal AI
2035
+ ModelCategory.VISION_LANGUAGE: self.models.MULTIMODAL_MODELS,
2036
+ ModelCategory.MULTIMODAL_REASONING: self.models.MULTIMODAL_MODELS,
2037
+ ModelCategory.VISUAL_QUESTION_ANSWERING: self.models.MULTIMODAL_MODELS,
2038
+ ModelCategory.MULTIMODAL_CHAT: self.models.MULTIMODAL_MODELS,
2039
+ ModelCategory.CROSS_MODAL_GENERATION: self.models.MULTIMODAL_MODELS,
2040
+ # Specialized AI
2041
+ ModelCategory.MUSIC_GENERATION: self.models.SPECIALIZED_AI_MODELS,
2042
+ ModelCategory.VOICE_CLONING: self.models.SPECIALIZED_AI_MODELS,
2043
+ ModelCategory.SUPER_RESOLUTION: self.models.SPECIALIZED_AI_MODELS,
2044
+ ModelCategory.FACE_RESTORATION: self.models.SPECIALIZED_AI_MODELS,
2045
+ ModelCategory.IMAGE_INPAINTING: self.models.SPECIALIZED_AI_MODELS,
2046
+ ModelCategory.BACKGROUND_REMOVAL: self.models.SPECIALIZED_AI_MODELS,
2047
+ # Creative Content
2048
+ ModelCategory.CREATIVE_WRITING: self.models.CREATIVE_CONTENT_MODELS,
2049
+ ModelCategory.STORY_GENERATION: self.models.CREATIVE_CONTENT_MODELS,
2050
+ ModelCategory.POETRY_GENERATION: self.models.CREATIVE_CONTENT_MODELS,
2051
+ ModelCategory.BLOG_WRITING: self.models.CREATIVE_CONTENT_MODELS,
2052
+ ModelCategory.MARKETING_COPY: self.models.CREATIVE_CONTENT_MODELS,
2053
+ # Game Development
2054
+ ModelCategory.GAME_ASSET_GENERATION: self.models.GAME_DEVELOPMENT_MODELS,
2055
+ ModelCategory.CHARACTER_GENERATION: self.models.GAME_DEVELOPMENT_MODELS,
2056
+ ModelCategory.LEVEL_GENERATION: self.models.GAME_DEVELOPMENT_MODELS,
2057
+ ModelCategory.DIALOGUE_GENERATION: self.models.GAME_DEVELOPMENT_MODELS,
2058
+ # Science and Research
2059
+ ModelCategory.PROTEIN_FOLDING: self.models.SCIENCE_RESEARCH_MODELS,
2060
+ ModelCategory.MOLECULE_GENERATION: self.models.SCIENCE_RESEARCH_MODELS,
2061
+ ModelCategory.SCIENTIFIC_WRITING: self.models.SCIENCE_RESEARCH_MODELS,
2062
+ ModelCategory.RESEARCH_ASSISTANCE: self.models.SCIENCE_RESEARCH_MODELS,
2063
+ ModelCategory.DATA_ANALYSIS: self.models.SCIENCE_RESEARCH_MODELS,
2064
+ # Business and Productivity
2065
+ ModelCategory.EMAIL_GENERATION: self.models.BUSINESS_PRODUCTIVITY_MODELS,
2066
+ ModelCategory.PRESENTATION_CREATION: self.models.BUSINESS_PRODUCTIVITY_MODELS,
2067
+ ModelCategory.REPORT_GENERATION: self.models.BUSINESS_PRODUCTIVITY_MODELS,
2068
+ ModelCategory.MEETING_SUMMARIZATION: self.models.BUSINESS_PRODUCTIVITY_MODELS,
2069
+ ModelCategory.PROJECT_PLANNING: self.models.BUSINESS_PRODUCTIVITY_MODELS,
2070
+ }
2071
+
2072
+ def get_model_by_id(self, model_id: str) -> Optional[HFModel]:
2073
+ """Find a model by its Hugging Face model ID"""
2074
+ for models_list in self.get_all_models().values():
2075
+ for model in models_list:
2076
+ if model.model_id == model_id:
2077
+ return model
2078
+ return None
2079
+
2080
+ async def call_model(self, model_id: str, category: ModelCategory, **kwargs) -> Any:
2081
+ """Call a Hugging Face model with the appropriate method based on category"""
2082
+
2083
+ async with HuggingFaceInference(self.api_token) as hf:
2084
+ if category == ModelCategory.TEXT_GENERATION:
2085
+ return await hf.text_generation(model_id, **kwargs)
2086
+ elif category == ModelCategory.TEXT_TO_IMAGE:
2087
+ return await hf.text_to_image(model_id, **kwargs)
2088
+ elif category == ModelCategory.AUTOMATIC_SPEECH_RECOGNITION:
2089
+ return await hf.automatic_speech_recognition(model_id, **kwargs)
2090
+ elif category == ModelCategory.TEXT_TO_SPEECH:
2091
+ return await hf.text_to_speech(model_id, **kwargs)
2092
+ elif category == ModelCategory.IMAGE_CLASSIFICATION:
2093
+ return await hf.image_classification(model_id, **kwargs)
2094
+ elif category == ModelCategory.FEATURE_EXTRACTION:
2095
+ return await hf.feature_extraction(model_id, **kwargs)
2096
+ elif category == ModelCategory.TRANSLATION:
2097
+ return await hf.translation(model_id, **kwargs)
2098
+ elif category == ModelCategory.SUMMARIZATION:
2099
+ return await hf.summarization(model_id, **kwargs)
2100
+ elif category == ModelCategory.QUESTION_ANSWERING:
2101
+ return await hf.question_answering(model_id, **kwargs)
2102
+ elif category == ModelCategory.ZERO_SHOT_CLASSIFICATION:
2103
+ return await hf.zero_shot_classification(model_id, **kwargs)
2104
+ elif category == ModelCategory.CONVERSATIONAL:
2105
+ return await hf.conversational(model_id, **kwargs)
2106
+
2107
+ # Video and Motion categories
2108
+ elif category in [
2109
+ ModelCategory.TEXT_TO_VIDEO,
2110
+ ModelCategory.VIDEO_GENERATION,
2111
+ ]:
2112
+ return await hf.text_to_video(model_id, **kwargs)
2113
+ elif category == ModelCategory.VIDEO_TO_TEXT:
2114
+ return await hf.video_to_text(model_id, **kwargs)
2115
+ elif category == ModelCategory.VIDEO_CLASSIFICATION:
2116
+ return await hf.image_classification(
2117
+ model_id, **kwargs
2118
+ ) # Similar to image classification
2119
+
2120
+ # Code and Development categories
2121
+ elif category in [
2122
+ ModelCategory.CODE_GENERATION,
2123
+ ModelCategory.APP_GENERATION,
2124
+ ]:
2125
+ return await hf.code_generation(model_id, **kwargs)
2126
+ elif category in [
2127
+ ModelCategory.CODE_COMPLETION,
2128
+ ModelCategory.CODE_EXPLANATION,
2129
+ ]:
2130
+ return await hf.code_completion(model_id, **kwargs)
2131
+
2132
+ # 3D and AR/VR categories
2133
+ elif category in [
2134
+ ModelCategory.TEXT_TO_3D,
2135
+ ModelCategory.THREE_D_GENERATION,
2136
+ ]:
2137
+ return await hf.text_to_3d(model_id, **kwargs)
2138
+ elif category in [ModelCategory.IMAGE_TO_3D, ModelCategory.MESH_GENERATION]:
2139
+ return await hf.image_to_3d(model_id, **kwargs)
2140
+
2141
+ # Document Processing categories
2142
+ elif category == ModelCategory.OCR:
2143
+ return await hf.ocr(model_id, **kwargs)
2144
+ elif category in [
2145
+ ModelCategory.DOCUMENT_ANALYSIS,
2146
+ ModelCategory.FORM_PROCESSING,
2147
+ ModelCategory.TABLE_EXTRACTION,
2148
+ ModelCategory.LAYOUT_ANALYSIS,
2149
+ ]:
2150
+ return await hf.document_analysis(model_id, **kwargs)
2151
+ elif category == ModelCategory.HANDWRITING_RECOGNITION:
2152
+ return await hf.ocr(model_id, **kwargs) # Similar to OCR
2153
+
2154
+ # Multimodal AI categories
2155
+ elif category in [
2156
+ ModelCategory.VISION_LANGUAGE,
2157
+ ModelCategory.VISUAL_QUESTION_ANSWERING,
2158
+ ModelCategory.IMAGE_TEXT_MATCHING,
2159
+ ]:
2160
+ return await hf.vision_language(model_id, **kwargs)
2161
+ elif category in [
2162
+ ModelCategory.MULTIMODAL_REASONING,
2163
+ ModelCategory.MULTIMODAL_CHAT,
2164
+ ModelCategory.CROSS_MODAL_GENERATION,
2165
+ ]:
2166
+ return await hf.multimodal_reasoning(model_id, **kwargs)
2167
+
2168
+ # Specialized AI categories
2169
+ elif category == ModelCategory.MUSIC_GENERATION:
2170
+ return await hf.music_generation(model_id, **kwargs)
2171
+ elif category == ModelCategory.VOICE_CLONING:
2172
+ return await hf.voice_cloning(model_id, **kwargs)
2173
+ elif category == ModelCategory.SUPER_RESOLUTION:
2174
+ return await hf.super_resolution(model_id, **kwargs)
2175
+ elif category in [
2176
+ ModelCategory.FACE_RESTORATION,
2177
+ ModelCategory.IMAGE_INPAINTING,
2178
+ ModelCategory.IMAGE_OUTPAINTING,
2179
+ ]:
2180
+ return await hf.super_resolution(
2181
+ model_id, **kwargs
2182
+ ) # Similar processing
2183
+ elif category == ModelCategory.BACKGROUND_REMOVAL:
2184
+ return await hf.background_removal(model_id, **kwargs)
2185
+
2186
+ # Creative Content categories
2187
+ elif category in [
2188
+ ModelCategory.CREATIVE_WRITING,
2189
+ ModelCategory.STORY_GENERATION,
2190
+ ModelCategory.POETRY_GENERATION,
2191
+ ModelCategory.SCREENPLAY_WRITING,
2192
+ ]:
2193
+ return await hf.creative_writing(model_id, **kwargs)
2194
+ elif category in [ModelCategory.BLOG_WRITING, ModelCategory.MARKETING_COPY]:
2195
+ return await hf.text_generation(
2196
+ model_id, **kwargs
2197
+ ) # Use standard text generation
2198
+
2199
+ # Game Development categories
2200
+ elif category in [
2201
+ ModelCategory.CHARACTER_GENERATION,
2202
+ ModelCategory.LEVEL_GENERATION,
2203
+ ModelCategory.DIALOGUE_GENERATION,
2204
+ ModelCategory.GAME_ASSET_GENERATION,
2205
+ ]:
2206
+ return await hf.creative_writing(
2207
+ model_id, **kwargs
2208
+ ) # Creative generation
2209
+
2210
+ # Science and Research categories
2211
+ elif category in [
2212
+ ModelCategory.PROTEIN_FOLDING,
2213
+ ModelCategory.MOLECULE_GENERATION,
2214
+ ]:
2215
+ return await hf.text_generation(
2216
+ model_id, **kwargs
2217
+ ) # Specialized text generation
2218
+ elif category in [
2219
+ ModelCategory.SCIENTIFIC_WRITING,
2220
+ ModelCategory.RESEARCH_ASSISTANCE,
2221
+ ModelCategory.DATA_ANALYSIS,
2222
+ ]:
2223
+ return await hf.text_generation(model_id, **kwargs)
2224
+
2225
+ # Business and Productivity categories
2226
+ elif category in [
2227
+ ModelCategory.EMAIL_GENERATION,
2228
+ ModelCategory.PRESENTATION_CREATION,
2229
+ ModelCategory.REPORT_GENERATION,
2230
+ ModelCategory.MEETING_SUMMARIZATION,
2231
+ ModelCategory.PROJECT_PLANNING,
2232
+ ]:
2233
+ return await hf.business_document(model_id, category.value, **kwargs)
2234
+
2235
+ else:
2236
+ raise ValueError(f"Unsupported model category: {category}")
2237
+
app/llm.py ADDED
@@ -0,0 +1,766 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ from typing import Dict, List, Optional, Union
3
+
4
+ import tiktoken
5
+ from openai import (
6
+ APIError,
7
+ AsyncAzureOpenAI,
8
+ AsyncOpenAI,
9
+ AuthenticationError,
10
+ OpenAIError,
11
+ RateLimitError,
12
+ )
13
+ from openai.types.chat import ChatCompletion, ChatCompletionMessage
14
+ from tenacity import (
15
+ retry,
16
+ retry_if_exception_type,
17
+ stop_after_attempt,
18
+ wait_random_exponential,
19
+ )
20
+
21
+ from app.bedrock import BedrockClient
22
+ from app.config import LLMSettings, config
23
+ from app.exceptions import TokenLimitExceeded
24
+ from app.logger import logger # Assuming a logger is set up in your app
25
+ from app.schema import (
26
+ ROLE_VALUES,
27
+ TOOL_CHOICE_TYPE,
28
+ TOOL_CHOICE_VALUES,
29
+ Message,
30
+ ToolChoice,
31
+ )
32
+
33
+
34
+ REASONING_MODELS = ["o1", "o3-mini"]
35
+ MULTIMODAL_MODELS = [
36
+ "gpt-4-vision-preview",
37
+ "gpt-4o",
38
+ "gpt-4o-mini",
39
+ "claude-3-opus-20240229",
40
+ "claude-3-sonnet-20240229",
41
+ "claude-3-haiku-20240307",
42
+ ]
43
+
44
+
45
+ class TokenCounter:
46
+ # Token constants
47
+ BASE_MESSAGE_TOKENS = 4
48
+ FORMAT_TOKENS = 2
49
+ LOW_DETAIL_IMAGE_TOKENS = 85
50
+ HIGH_DETAIL_TILE_TOKENS = 170
51
+
52
+ # Image processing constants
53
+ MAX_SIZE = 2048
54
+ HIGH_DETAIL_TARGET_SHORT_SIDE = 768
55
+ TILE_SIZE = 512
56
+
57
+ def __init__(self, tokenizer):
58
+ self.tokenizer = tokenizer
59
+
60
+ def count_text(self, text: str) -> int:
61
+ """Calculate tokens for a text string"""
62
+ return 0 if not text else len(self.tokenizer.encode(text))
63
+
64
+ def count_image(self, image_item: dict) -> int:
65
+ """
66
+ Calculate tokens for an image based on detail level and dimensions
67
+
68
+ For "low" detail: fixed 85 tokens
69
+ For "high" detail:
70
+ 1. Scale to fit in 2048x2048 square
71
+ 2. Scale shortest side to 768px
72
+ 3. Count 512px tiles (170 tokens each)
73
+ 4. Add 85 tokens
74
+ """
75
+ detail = image_item.get("detail", "medium")
76
+
77
+ # For low detail, always return fixed token count
78
+ if detail == "low":
79
+ return self.LOW_DETAIL_IMAGE_TOKENS
80
+
81
+ # For medium detail (default in OpenAI), use high detail calculation
82
+ # OpenAI doesn't specify a separate calculation for medium
83
+
84
+ # For high detail, calculate based on dimensions if available
85
+ if detail == "high" or detail == "medium":
86
+ # If dimensions are provided in the image_item
87
+ if "dimensions" in image_item:
88
+ width, height = image_item["dimensions"]
89
+ return self._calculate_high_detail_tokens(width, height)
90
+
91
+ return (
92
+ self._calculate_high_detail_tokens(1024, 1024) if detail == "high" else 1024
93
+ )
94
+
95
+ def _calculate_high_detail_tokens(self, width: int, height: int) -> int:
96
+ """Calculate tokens for high detail images based on dimensions"""
97
+ # Step 1: Scale to fit in MAX_SIZE x MAX_SIZE square
98
+ if width > self.MAX_SIZE or height > self.MAX_SIZE:
99
+ scale = self.MAX_SIZE / max(width, height)
100
+ width = int(width * scale)
101
+ height = int(height * scale)
102
+
103
+ # Step 2: Scale so shortest side is HIGH_DETAIL_TARGET_SHORT_SIDE
104
+ scale = self.HIGH_DETAIL_TARGET_SHORT_SIDE / min(width, height)
105
+ scaled_width = int(width * scale)
106
+ scaled_height = int(height * scale)
107
+
108
+ # Step 3: Count number of 512px tiles
109
+ tiles_x = math.ceil(scaled_width / self.TILE_SIZE)
110
+ tiles_y = math.ceil(scaled_height / self.TILE_SIZE)
111
+ total_tiles = tiles_x * tiles_y
112
+
113
+ # Step 4: Calculate final token count
114
+ return (
115
+ total_tiles * self.HIGH_DETAIL_TILE_TOKENS
116
+ ) + self.LOW_DETAIL_IMAGE_TOKENS
117
+
118
+ def count_content(self, content: Union[str, List[Union[str, dict]]]) -> int:
119
+ """Calculate tokens for message content"""
120
+ if not content:
121
+ return 0
122
+
123
+ if isinstance(content, str):
124
+ return self.count_text(content)
125
+
126
+ token_count = 0
127
+ for item in content:
128
+ if isinstance(item, str):
129
+ token_count += self.count_text(item)
130
+ elif isinstance(item, dict):
131
+ if "text" in item:
132
+ token_count += self.count_text(item["text"])
133
+ elif "image_url" in item:
134
+ token_count += self.count_image(item)
135
+ return token_count
136
+
137
+ def count_tool_calls(self, tool_calls: List[dict]) -> int:
138
+ """Calculate tokens for tool calls"""
139
+ token_count = 0
140
+ for tool_call in tool_calls:
141
+ if "function" in tool_call:
142
+ function = tool_call["function"]
143
+ token_count += self.count_text(function.get("name", ""))
144
+ token_count += self.count_text(function.get("arguments", ""))
145
+ return token_count
146
+
147
+ def count_message_tokens(self, messages: List[dict]) -> int:
148
+ """Calculate the total number of tokens in a message list"""
149
+ total_tokens = self.FORMAT_TOKENS # Base format tokens
150
+
151
+ for message in messages:
152
+ tokens = self.BASE_MESSAGE_TOKENS # Base tokens per message
153
+
154
+ # Add role tokens
155
+ tokens += self.count_text(message.get("role", ""))
156
+
157
+ # Add content tokens
158
+ if "content" in message:
159
+ tokens += self.count_content(message["content"])
160
+
161
+ # Add tool calls tokens
162
+ if "tool_calls" in message:
163
+ tokens += self.count_tool_calls(message["tool_calls"])
164
+
165
+ # Add name and tool_call_id tokens
166
+ tokens += self.count_text(message.get("name", ""))
167
+ tokens += self.count_text(message.get("tool_call_id", ""))
168
+
169
+ total_tokens += tokens
170
+
171
+ return total_tokens
172
+
173
+
174
+ class LLM:
175
+ _instances: Dict[str, "LLM"] = {}
176
+
177
+ def __new__(
178
+ cls, config_name: str = "default", llm_config: Optional[LLMSettings] = None
179
+ ):
180
+ if config_name not in cls._instances:
181
+ instance = super().__new__(cls)
182
+ instance.__init__(config_name, llm_config)
183
+ cls._instances[config_name] = instance
184
+ return cls._instances[config_name]
185
+
186
+ def __init__(
187
+ self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
188
+ ):
189
+ if not hasattr(self, "client"): # Only initialize if not already initialized
190
+ llm_config = llm_config or config.llm
191
+ llm_config = llm_config.get(config_name, llm_config["default"])
192
+ self.model = llm_config.model
193
+ self.max_tokens = llm_config.max_tokens
194
+ self.temperature = llm_config.temperature
195
+ self.api_type = llm_config.api_type
196
+ self.api_key = llm_config.api_key
197
+ self.api_version = llm_config.api_version
198
+ self.base_url = llm_config.base_url
199
+
200
+ # Add token counting related attributes
201
+ self.total_input_tokens = 0
202
+ self.total_completion_tokens = 0
203
+ self.max_input_tokens = (
204
+ llm_config.max_input_tokens
205
+ if hasattr(llm_config, "max_input_tokens")
206
+ else None
207
+ )
208
+
209
+ # Initialize tokenizer
210
+ try:
211
+ self.tokenizer = tiktoken.encoding_for_model(self.model)
212
+ except KeyError:
213
+ # If the model is not in tiktoken's presets, use cl100k_base as default
214
+ self.tokenizer = tiktoken.get_encoding("cl100k_base")
215
+
216
+ if self.api_type == "azure":
217
+ self.client = AsyncAzureOpenAI(
218
+ base_url=self.base_url,
219
+ api_key=self.api_key,
220
+ api_version=self.api_version,
221
+ )
222
+ elif self.api_type == "aws":
223
+ self.client = BedrockClient()
224
+ else:
225
+ self.client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
226
+
227
+ self.token_counter = TokenCounter(self.tokenizer)
228
+
229
+ def count_tokens(self, text: str) -> int:
230
+ """Calculate the number of tokens in a text"""
231
+ if not text:
232
+ return 0
233
+ return len(self.tokenizer.encode(text))
234
+
235
+ def count_message_tokens(self, messages: List[dict]) -> int:
236
+ return self.token_counter.count_message_tokens(messages)
237
+
238
+ def update_token_count(self, input_tokens: int, completion_tokens: int = 0) -> None:
239
+ """Update token counts"""
240
+ # Only track tokens if max_input_tokens is set
241
+ self.total_input_tokens += input_tokens
242
+ self.total_completion_tokens += completion_tokens
243
+ logger.info(
244
+ f"Token usage: Input={input_tokens}, Completion={completion_tokens}, "
245
+ f"Cumulative Input={self.total_input_tokens}, Cumulative Completion={self.total_completion_tokens}, "
246
+ f"Total={input_tokens + completion_tokens}, Cumulative Total={self.total_input_tokens + self.total_completion_tokens}"
247
+ )
248
+
249
+ def check_token_limit(self, input_tokens: int) -> bool:
250
+ """Check if token limits are exceeded"""
251
+ if self.max_input_tokens is not None:
252
+ return (self.total_input_tokens + input_tokens) <= self.max_input_tokens
253
+ # If max_input_tokens is not set, always return True
254
+ return True
255
+
256
+ def get_limit_error_message(self, input_tokens: int) -> str:
257
+ """Generate error message for token limit exceeded"""
258
+ if (
259
+ self.max_input_tokens is not None
260
+ and (self.total_input_tokens + input_tokens) > self.max_input_tokens
261
+ ):
262
+ return f"Request may exceed input token limit (Current: {self.total_input_tokens}, Needed: {input_tokens}, Max: {self.max_input_tokens})"
263
+
264
+ return "Token limit exceeded"
265
+
266
+ @staticmethod
267
+ def format_messages(
268
+ messages: List[Union[dict, Message]], supports_images: bool = False
269
+ ) -> List[dict]:
270
+ """
271
+ Format messages for LLM by converting them to OpenAI message format.
272
+
273
+ Args:
274
+ messages: List of messages that can be either dict or Message objects
275
+ supports_images: Flag indicating if the target model supports image inputs
276
+
277
+ Returns:
278
+ List[dict]: List of formatted messages in OpenAI format
279
+
280
+ Raises:
281
+ ValueError: If messages are invalid or missing required fields
282
+ TypeError: If unsupported message types are provided
283
+
284
+ Examples:
285
+ >>> msgs = [
286
+ ... Message.system_message("You are a helpful assistant"),
287
+ ... {"role": "user", "content": "Hello"},
288
+ ... Message.user_message("How are you?")
289
+ ... ]
290
+ >>> formatted = LLM.format_messages(msgs)
291
+ """
292
+ formatted_messages = []
293
+
294
+ for message in messages:
295
+ # Convert Message objects to dictionaries
296
+ if isinstance(message, Message):
297
+ message = message.to_dict()
298
+
299
+ if isinstance(message, dict):
300
+ # If message is a dict, ensure it has required fields
301
+ if "role" not in message:
302
+ raise ValueError("Message dict must contain 'role' field")
303
+
304
+ # Process base64 images if present and model supports images
305
+ if supports_images and message.get("base64_image"):
306
+ # Initialize or convert content to appropriate format
307
+ if not message.get("content"):
308
+ message["content"] = []
309
+ elif isinstance(message["content"], str):
310
+ message["content"] = [
311
+ {"type": "text", "text": message["content"]}
312
+ ]
313
+ elif isinstance(message["content"], list):
314
+ # Convert string items to proper text objects
315
+ message["content"] = [
316
+ (
317
+ {"type": "text", "text": item}
318
+ if isinstance(item, str)
319
+ else item
320
+ )
321
+ for item in message["content"]
322
+ ]
323
+
324
+ # Add the image to content
325
+ message["content"].append(
326
+ {
327
+ "type": "image_url",
328
+ "image_url": {
329
+ "url": f"data:image/jpeg;base64,{message['base64_image']}"
330
+ },
331
+ }
332
+ )
333
+
334
+ # Remove the base64_image field
335
+ del message["base64_image"]
336
+ # If model doesn't support images but message has base64_image, handle gracefully
337
+ elif not supports_images and message.get("base64_image"):
338
+ # Just remove the base64_image field and keep the text content
339
+ del message["base64_image"]
340
+
341
+ if "content" in message or "tool_calls" in message:
342
+ formatted_messages.append(message)
343
+ # else: do not include the message
344
+ else:
345
+ raise TypeError(f"Unsupported message type: {type(message)}")
346
+
347
+ # Validate all messages have required fields
348
+ for msg in formatted_messages:
349
+ if msg["role"] not in ROLE_VALUES:
350
+ raise ValueError(f"Invalid role: {msg['role']}")
351
+
352
+ return formatted_messages
353
+
354
+ @retry(
355
+ wait=wait_random_exponential(min=1, max=60),
356
+ stop=stop_after_attempt(6),
357
+ retry=retry_if_exception_type(
358
+ (OpenAIError, Exception, ValueError)
359
+ ), # Don't retry TokenLimitExceeded
360
+ )
361
+ async def ask(
362
+ self,
363
+ messages: List[Union[dict, Message]],
364
+ system_msgs: Optional[List[Union[dict, Message]]] = None,
365
+ stream: bool = True,
366
+ temperature: Optional[float] = None,
367
+ ) -> str:
368
+ """
369
+ Send a prompt to the LLM and get the response.
370
+
371
+ Args:
372
+ messages: List of conversation messages
373
+ system_msgs: Optional system messages to prepend
374
+ stream (bool): Whether to stream the response
375
+ temperature (float): Sampling temperature for the response
376
+
377
+ Returns:
378
+ str: The generated response
379
+
380
+ Raises:
381
+ TokenLimitExceeded: If token limits are exceeded
382
+ ValueError: If messages are invalid or response is empty
383
+ OpenAIError: If API call fails after retries
384
+ Exception: For unexpected errors
385
+ """
386
+ try:
387
+ # Check if the model supports images
388
+ supports_images = self.model in MULTIMODAL_MODELS
389
+
390
+ # Format system and user messages with image support check
391
+ if system_msgs:
392
+ system_msgs = self.format_messages(system_msgs, supports_images)
393
+ messages = system_msgs + self.format_messages(messages, supports_images)
394
+ else:
395
+ messages = self.format_messages(messages, supports_images)
396
+
397
+ # Calculate input token count
398
+ input_tokens = self.count_message_tokens(messages)
399
+
400
+ # Check if token limits are exceeded
401
+ if not self.check_token_limit(input_tokens):
402
+ error_message = self.get_limit_error_message(input_tokens)
403
+ # Raise a special exception that won't be retried
404
+ raise TokenLimitExceeded(error_message)
405
+
406
+ params = {
407
+ "model": self.model,
408
+ "messages": messages,
409
+ }
410
+
411
+ if self.model in REASONING_MODELS:
412
+ params["max_completion_tokens"] = self.max_tokens
413
+ else:
414
+ params["max_tokens"] = self.max_tokens
415
+ params["temperature"] = (
416
+ temperature if temperature is not None else self.temperature
417
+ )
418
+
419
+ if not stream:
420
+ # Non-streaming request
421
+ response = await self.client.chat.completions.create(
422
+ **params, stream=False
423
+ )
424
+
425
+ if not response.choices or not response.choices[0].message.content:
426
+ raise ValueError("Empty or invalid response from LLM")
427
+
428
+ # Update token counts
429
+ self.update_token_count(
430
+ response.usage.prompt_tokens, response.usage.completion_tokens
431
+ )
432
+
433
+ return response.choices[0].message.content
434
+
435
+ # Streaming request, For streaming, update estimated token count before making the request
436
+ self.update_token_count(input_tokens)
437
+
438
+ response = await self.client.chat.completions.create(**params, stream=True)
439
+
440
+ collected_messages = []
441
+ completion_text = ""
442
+ async for chunk in response:
443
+ chunk_message = chunk.choices[0].delta.content or ""
444
+ collected_messages.append(chunk_message)
445
+ completion_text += chunk_message
446
+ print(chunk_message, end="", flush=True)
447
+
448
+ print() # Newline after streaming
449
+ full_response = "".join(collected_messages).strip()
450
+ if not full_response:
451
+ raise ValueError("Empty response from streaming LLM")
452
+
453
+ # estimate completion tokens for streaming response
454
+ completion_tokens = self.count_tokens(completion_text)
455
+ logger.info(
456
+ f"Estimated completion tokens for streaming response: {completion_tokens}"
457
+ )
458
+ self.total_completion_tokens += completion_tokens
459
+
460
+ return full_response
461
+
462
+ except TokenLimitExceeded:
463
+ # Re-raise token limit errors without logging
464
+ raise
465
+ except ValueError:
466
+ logger.exception(f"Validation error")
467
+ raise
468
+ except OpenAIError as oe:
469
+ logger.exception(f"OpenAI API error")
470
+ if isinstance(oe, AuthenticationError):
471
+ logger.error("Authentication failed. Check API key.")
472
+ elif isinstance(oe, RateLimitError):
473
+ logger.error("Rate limit exceeded. Consider increasing retry attempts.")
474
+ elif isinstance(oe, APIError):
475
+ logger.error(f"API error: {oe}")
476
+ raise
477
+ except Exception:
478
+ logger.exception(f"Unexpected error in ask")
479
+ raise
480
+
481
+ @retry(
482
+ wait=wait_random_exponential(min=1, max=60),
483
+ stop=stop_after_attempt(6),
484
+ retry=retry_if_exception_type(
485
+ (OpenAIError, Exception, ValueError)
486
+ ), # Don't retry TokenLimitExceeded
487
+ )
488
+ async def ask_with_images(
489
+ self,
490
+ messages: List[Union[dict, Message]],
491
+ images: List[Union[str, dict]],
492
+ system_msgs: Optional[List[Union[dict, Message]]] = None,
493
+ stream: bool = False,
494
+ temperature: Optional[float] = None,
495
+ ) -> str:
496
+ """
497
+ Send a prompt with images to the LLM and get the response.
498
+
499
+ Args:
500
+ messages: List of conversation messages
501
+ images: List of image URLs or image data dictionaries
502
+ system_msgs: Optional system messages to prepend
503
+ stream (bool): Whether to stream the response
504
+ temperature (float): Sampling temperature for the response
505
+
506
+ Returns:
507
+ str: The generated response
508
+
509
+ Raises:
510
+ TokenLimitExceeded: If token limits are exceeded
511
+ ValueError: If messages are invalid or response is empty
512
+ OpenAIError: If API call fails after retries
513
+ Exception: For unexpected errors
514
+ """
515
+ try:
516
+ # For ask_with_images, we always set supports_images to True because
517
+ # this method should only be called with models that support images
518
+ if self.model not in MULTIMODAL_MODELS:
519
+ raise ValueError(
520
+ f"Model {self.model} does not support images. Use a model from {MULTIMODAL_MODELS}"
521
+ )
522
+
523
+ # Format messages with image support
524
+ formatted_messages = self.format_messages(messages, supports_images=True)
525
+
526
+ # Ensure the last message is from the user to attach images
527
+ if not formatted_messages or formatted_messages[-1]["role"] != "user":
528
+ raise ValueError(
529
+ "The last message must be from the user to attach images"
530
+ )
531
+
532
+ # Process the last user message to include images
533
+ last_message = formatted_messages[-1]
534
+
535
+ # Convert content to multimodal format if needed
536
+ content = last_message["content"]
537
+ multimodal_content = (
538
+ [{"type": "text", "text": content}]
539
+ if isinstance(content, str)
540
+ else content
541
+ if isinstance(content, list)
542
+ else []
543
+ )
544
+
545
+ # Add images to content
546
+ for image in images:
547
+ if isinstance(image, str):
548
+ multimodal_content.append(
549
+ {"type": "image_url", "image_url": {"url": image}}
550
+ )
551
+ elif isinstance(image, dict) and "url" in image:
552
+ multimodal_content.append({"type": "image_url", "image_url": image})
553
+ elif isinstance(image, dict) and "image_url" in image:
554
+ multimodal_content.append(image)
555
+ else:
556
+ raise ValueError(f"Unsupported image format: {image}")
557
+
558
+ # Update the message with multimodal content
559
+ last_message["content"] = multimodal_content
560
+
561
+ # Add system messages if provided
562
+ if system_msgs:
563
+ all_messages = (
564
+ self.format_messages(system_msgs, supports_images=True)
565
+ + formatted_messages
566
+ )
567
+ else:
568
+ all_messages = formatted_messages
569
+
570
+ # Calculate tokens and check limits
571
+ input_tokens = self.count_message_tokens(all_messages)
572
+ if not self.check_token_limit(input_tokens):
573
+ raise TokenLimitExceeded(self.get_limit_error_message(input_tokens))
574
+
575
+ # Set up API parameters
576
+ params = {
577
+ "model": self.model,
578
+ "messages": all_messages,
579
+ "stream": stream,
580
+ }
581
+
582
+ # Add model-specific parameters
583
+ if self.model in REASONING_MODELS:
584
+ params["max_completion_tokens"] = self.max_tokens
585
+ else:
586
+ params["max_tokens"] = self.max_tokens
587
+ params["temperature"] = (
588
+ temperature if temperature is not None else self.temperature
589
+ )
590
+
591
+ # Handle non-streaming request
592
+ if not stream:
593
+ response = await self.client.chat.completions.create(**params)
594
+
595
+ if not response.choices or not response.choices[0].message.content:
596
+ raise ValueError("Empty or invalid response from LLM")
597
+
598
+ self.update_token_count(response.usage.prompt_tokens)
599
+ return response.choices[0].message.content
600
+
601
+ # Handle streaming request
602
+ self.update_token_count(input_tokens)
603
+ response = await self.client.chat.completions.create(**params)
604
+
605
+ collected_messages = []
606
+ async for chunk in response:
607
+ chunk_message = chunk.choices[0].delta.content or ""
608
+ collected_messages.append(chunk_message)
609
+ print(chunk_message, end="", flush=True)
610
+
611
+ print() # Newline after streaming
612
+ full_response = "".join(collected_messages).strip()
613
+
614
+ if not full_response:
615
+ raise ValueError("Empty response from streaming LLM")
616
+
617
+ return full_response
618
+
619
+ except TokenLimitExceeded:
620
+ raise
621
+ except ValueError as ve:
622
+ logger.error(f"Validation error in ask_with_images: {ve}")
623
+ raise
624
+ except OpenAIError as oe:
625
+ logger.error(f"OpenAI API error: {oe}")
626
+ if isinstance(oe, AuthenticationError):
627
+ logger.error("Authentication failed. Check API key.")
628
+ elif isinstance(oe, RateLimitError):
629
+ logger.error("Rate limit exceeded. Consider increasing retry attempts.")
630
+ elif isinstance(oe, APIError):
631
+ logger.error(f"API error: {oe}")
632
+ raise
633
+ except Exception as e:
634
+ logger.error(f"Unexpected error in ask_with_images: {e}")
635
+ raise
636
+
637
+ @retry(
638
+ wait=wait_random_exponential(min=1, max=60),
639
+ stop=stop_after_attempt(6),
640
+ retry=retry_if_exception_type(
641
+ (OpenAIError, Exception, ValueError)
642
+ ), # Don't retry TokenLimitExceeded
643
+ )
644
+ async def ask_tool(
645
+ self,
646
+ messages: List[Union[dict, Message]],
647
+ system_msgs: Optional[List[Union[dict, Message]]] = None,
648
+ timeout: int = 300,
649
+ tools: Optional[List[dict]] = None,
650
+ tool_choice: TOOL_CHOICE_TYPE = ToolChoice.AUTO, # type: ignore
651
+ temperature: Optional[float] = None,
652
+ **kwargs,
653
+ ) -> ChatCompletionMessage | None:
654
+ """
655
+ Ask LLM using functions/tools and return the response.
656
+
657
+ Args:
658
+ messages: List of conversation messages
659
+ system_msgs: Optional system messages to prepend
660
+ timeout: Request timeout in seconds
661
+ tools: List of tools to use
662
+ tool_choice: Tool choice strategy
663
+ temperature: Sampling temperature for the response
664
+ **kwargs: Additional completion arguments
665
+
666
+ Returns:
667
+ ChatCompletionMessage: The model's response
668
+
669
+ Raises:
670
+ TokenLimitExceeded: If token limits are exceeded
671
+ ValueError: If tools, tool_choice, or messages are invalid
672
+ OpenAIError: If API call fails after retries
673
+ Exception: For unexpected errors
674
+ """
675
+ try:
676
+ # Validate tool_choice
677
+ if tool_choice not in TOOL_CHOICE_VALUES:
678
+ raise ValueError(f"Invalid tool_choice: {tool_choice}")
679
+
680
+ # Check if the model supports images
681
+ supports_images = self.model in MULTIMODAL_MODELS
682
+
683
+ # Format messages
684
+ if system_msgs:
685
+ system_msgs = self.format_messages(system_msgs, supports_images)
686
+ messages = system_msgs + self.format_messages(messages, supports_images)
687
+ else:
688
+ messages = self.format_messages(messages, supports_images)
689
+
690
+ # Calculate input token count
691
+ input_tokens = self.count_message_tokens(messages)
692
+
693
+ # If there are tools, calculate token count for tool descriptions
694
+ tools_tokens = 0
695
+ if tools:
696
+ for tool in tools:
697
+ tools_tokens += self.count_tokens(str(tool))
698
+
699
+ input_tokens += tools_tokens
700
+
701
+ # Check if token limits are exceeded
702
+ if not self.check_token_limit(input_tokens):
703
+ error_message = self.get_limit_error_message(input_tokens)
704
+ # Raise a special exception that won't be retried
705
+ raise TokenLimitExceeded(error_message)
706
+
707
+ # Validate tools if provided
708
+ if tools:
709
+ for tool in tools:
710
+ if not isinstance(tool, dict) or "type" not in tool:
711
+ raise ValueError("Each tool must be a dict with 'type' field")
712
+
713
+ # Set up the completion request
714
+ params = {
715
+ "model": self.model,
716
+ "messages": messages,
717
+ "tools": tools,
718
+ "tool_choice": tool_choice,
719
+ "timeout": timeout,
720
+ **kwargs,
721
+ }
722
+
723
+ if self.model in REASONING_MODELS:
724
+ params["max_completion_tokens"] = self.max_tokens
725
+ else:
726
+ params["max_tokens"] = self.max_tokens
727
+ params["temperature"] = (
728
+ temperature if temperature is not None else self.temperature
729
+ )
730
+
731
+ params["stream"] = False # Always use non-streaming for tool requests
732
+ response: ChatCompletion = await self.client.chat.completions.create(
733
+ **params
734
+ )
735
+
736
+ # Check if response is valid
737
+ if not response.choices or not response.choices[0].message:
738
+ print(response)
739
+ # raise ValueError("Invalid or empty response from LLM")
740
+ return None
741
+
742
+ # Update token counts
743
+ self.update_token_count(
744
+ response.usage.prompt_tokens, response.usage.completion_tokens
745
+ )
746
+
747
+ return response.choices[0].message
748
+
749
+ except TokenLimitExceeded:
750
+ # Re-raise token limit errors without logging
751
+ raise
752
+ except ValueError as ve:
753
+ logger.error(f"Validation error in ask_tool: {ve}")
754
+ raise
755
+ except OpenAIError as oe:
756
+ logger.error(f"OpenAI API error: {oe}")
757
+ if isinstance(oe, AuthenticationError):
758
+ logger.error("Authentication failed. Check API key.")
759
+ elif isinstance(oe, RateLimitError):
760
+ logger.error("Rate limit exceeded. Consider increasing retry attempts.")
761
+ elif isinstance(oe, APIError):
762
+ logger.error(f"API error: {oe}")
763
+ raise
764
+ except Exception as e:
765
+ logger.error(f"Unexpected error in ask_tool: {e}")
766
+ raise
app/logger.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from datetime import datetime
3
+
4
+ from loguru import logger as _logger
5
+
6
+ from app.config import PROJECT_ROOT
7
+
8
+
9
+ _print_level = "INFO"
10
+
11
+
12
+ def define_log_level(print_level="INFO", logfile_level="DEBUG", name: str = None):
13
+ """Adjust the log level to above level"""
14
+ global _print_level
15
+ _print_level = print_level
16
+
17
+ current_date = datetime.now()
18
+ formatted_date = current_date.strftime("%Y%m%d%H%M%S")
19
+ log_name = (
20
+ f"{name}_{formatted_date}" if name else formatted_date
21
+ ) # name a log with prefix name
22
+
23
+ _logger.remove()
24
+ _logger.add(sys.stderr, level=print_level)
25
+ _logger.add(PROJECT_ROOT / f"logs/{log_name}.log", level=logfile_level)
26
+ return _logger
27
+
28
+
29
+ logger = define_log_level()
30
+
31
+
32
+ if __name__ == "__main__":
33
+ logger.info("Starting application")
34
+ logger.debug("Debug message")
35
+ logger.warning("Warning message")
36
+ logger.error("Error message")
37
+ logger.critical("Critical message")
38
+
39
+ try:
40
+ raise ValueError("Test error")
41
+ except Exception as e:
42
+ logger.exception(f"An error occurred: {e}")
app/mcp/__init__.py ADDED
File without changes
app/mcp/server.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import sys
3
+
4
+
5
+ logging.basicConfig(level=logging.INFO, handlers=[logging.StreamHandler(sys.stderr)])
6
+
7
+ import argparse
8
+ import asyncio
9
+ import atexit
10
+ import json
11
+ from inspect import Parameter, Signature
12
+ from typing import Any, Dict, Optional
13
+
14
+ from mcp.server.fastmcp import FastMCP
15
+
16
+ from app.logger import logger
17
+ from app.tool.base import BaseTool
18
+ from app.tool.bash import Bash
19
+ from app.tool.browser_use_tool import BrowserUseTool
20
+ from app.tool.str_replace_editor import StrReplaceEditor
21
+ from app.tool.terminate import Terminate
22
+
23
+
24
+ class MCPServer:
25
+ """MCP Server implementation with tool registration and management."""
26
+
27
+ def __init__(self, name: str = "openmanus"):
28
+ self.server = FastMCP(name)
29
+ self.tools: Dict[str, BaseTool] = {}
30
+
31
+ # Initialize standard tools
32
+ self.tools["bash"] = Bash()
33
+ self.tools["browser"] = BrowserUseTool()
34
+ self.tools["editor"] = StrReplaceEditor()
35
+ self.tools["terminate"] = Terminate()
36
+
37
+ def register_tool(self, tool: BaseTool, method_name: Optional[str] = None) -> None:
38
+ """Register a tool with parameter validation and documentation."""
39
+ tool_name = method_name or tool.name
40
+ tool_param = tool.to_param()
41
+ tool_function = tool_param["function"]
42
+
43
+ # Define the async function to be registered
44
+ async def tool_method(**kwargs):
45
+ logger.info(f"Executing {tool_name}: {kwargs}")
46
+ result = await tool.execute(**kwargs)
47
+
48
+ logger.info(f"Result of {tool_name}: {result}")
49
+
50
+ # Handle different types of results (match original logic)
51
+ if hasattr(result, "model_dump"):
52
+ return json.dumps(result.model_dump())
53
+ elif isinstance(result, dict):
54
+ return json.dumps(result)
55
+ return result
56
+
57
+ # Set method metadata
58
+ tool_method.__name__ = tool_name
59
+ tool_method.__doc__ = self._build_docstring(tool_function)
60
+ tool_method.__signature__ = self._build_signature(tool_function)
61
+
62
+ # Store parameter schema (important for tools that access it programmatically)
63
+ param_props = tool_function.get("parameters", {}).get("properties", {})
64
+ required_params = tool_function.get("parameters", {}).get("required", [])
65
+ tool_method._parameter_schema = {
66
+ param_name: {
67
+ "description": param_details.get("description", ""),
68
+ "type": param_details.get("type", "any"),
69
+ "required": param_name in required_params,
70
+ }
71
+ for param_name, param_details in param_props.items()
72
+ }
73
+
74
+ # Register with server
75
+ self.server.tool()(tool_method)
76
+ logger.info(f"Registered tool: {tool_name}")
77
+
78
+ def _build_docstring(self, tool_function: dict) -> str:
79
+ """Build a formatted docstring from tool function metadata."""
80
+ description = tool_function.get("description", "")
81
+ param_props = tool_function.get("parameters", {}).get("properties", {})
82
+ required_params = tool_function.get("parameters", {}).get("required", [])
83
+
84
+ # Build docstring (match original format)
85
+ docstring = description
86
+ if param_props:
87
+ docstring += "\n\nParameters:\n"
88
+ for param_name, param_details in param_props.items():
89
+ required_str = (
90
+ "(required)" if param_name in required_params else "(optional)"
91
+ )
92
+ param_type = param_details.get("type", "any")
93
+ param_desc = param_details.get("description", "")
94
+ docstring += (
95
+ f" {param_name} ({param_type}) {required_str}: {param_desc}\n"
96
+ )
97
+
98
+ return docstring
99
+
100
+ def _build_signature(self, tool_function: dict) -> Signature:
101
+ """Build a function signature from tool function metadata."""
102
+ param_props = tool_function.get("parameters", {}).get("properties", {})
103
+ required_params = tool_function.get("parameters", {}).get("required", [])
104
+
105
+ parameters = []
106
+
107
+ # Follow original type mapping
108
+ for param_name, param_details in param_props.items():
109
+ param_type = param_details.get("type", "")
110
+ default = Parameter.empty if param_name in required_params else None
111
+
112
+ # Map JSON Schema types to Python types (same as original)
113
+ annotation = Any
114
+ if param_type == "string":
115
+ annotation = str
116
+ elif param_type == "integer":
117
+ annotation = int
118
+ elif param_type == "number":
119
+ annotation = float
120
+ elif param_type == "boolean":
121
+ annotation = bool
122
+ elif param_type == "object":
123
+ annotation = dict
124
+ elif param_type == "array":
125
+ annotation = list
126
+
127
+ # Create parameter with same structure as original
128
+ param = Parameter(
129
+ name=param_name,
130
+ kind=Parameter.KEYWORD_ONLY,
131
+ default=default,
132
+ annotation=annotation,
133
+ )
134
+ parameters.append(param)
135
+
136
+ return Signature(parameters=parameters)
137
+
138
+ async def cleanup(self) -> None:
139
+ """Clean up server resources."""
140
+ logger.info("Cleaning up resources")
141
+ # Follow original cleanup logic - only clean browser tool
142
+ if "browser" in self.tools and hasattr(self.tools["browser"], "cleanup"):
143
+ await self.tools["browser"].cleanup()
144
+
145
+ def register_all_tools(self) -> None:
146
+ """Register all tools with the server."""
147
+ for tool in self.tools.values():
148
+ self.register_tool(tool)
149
+
150
+ def run(self, transport: str = "stdio") -> None:
151
+ """Run the MCP server."""
152
+ # Register all tools
153
+ self.register_all_tools()
154
+
155
+ # Register cleanup function (match original behavior)
156
+ atexit.register(lambda: asyncio.run(self.cleanup()))
157
+
158
+ # Start server (with same logging as original)
159
+ logger.info(f"Starting OpenManus server ({transport} mode)")
160
+ self.server.run(transport=transport)
161
+
162
+
163
+ def parse_args() -> argparse.Namespace:
164
+ """Parse command line arguments."""
165
+ parser = argparse.ArgumentParser(description="OpenManus MCP Server")
166
+ parser.add_argument(
167
+ "--transport",
168
+ choices=["stdio"],
169
+ default="stdio",
170
+ help="Communication method: stdio or http (default: stdio)",
171
+ )
172
+ return parser.parse_args()
173
+
174
+
175
+ if __name__ == "__main__":
176
+ args = parse_args()
177
+
178
+ # Create and run server (maintaining original flow)
179
+ server = MCPServer()
180
+ server.run(transport=args.transport)
app/production_config.py ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Complete Configuration for OpenManus Production Deployment
3
+ Includes: All model configurations, agent settings, category mappings, and service configurations
4
+ """
5
+
6
+ import os
7
+ from typing import Dict, List, Optional, Any
8
+ from dataclasses import dataclass
9
+ from enum import Enum
10
+
11
+
12
+ @dataclass
13
+ class ModelConfig:
14
+ """Configuration for individual AI models"""
15
+
16
+ name: str
17
+ category: str
18
+ api_endpoint: str
19
+ max_tokens: int = 4096
20
+ temperature: float = 0.7
21
+ supported_formats: List[str] = None
22
+ special_parameters: Dict[str, Any] = None
23
+ rate_limit: int = 100 # requests per minute
24
+
25
+
26
+ class CategoryConfig:
27
+ """Configuration for model categories"""
28
+
29
+ # Core AI Models - Text Generation (Qwen, DeepSeek, etc.)
30
+ TEXT_GENERATION_MODELS = {
31
+ # Qwen Models (35 models)
32
+ "qwen/qwen-2.5-72b-instruct": ModelConfig(
33
+ name="Qwen 2.5 72B Instruct",
34
+ category="text-generation",
35
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-72B-Instruct",
36
+ max_tokens=8192,
37
+ temperature=0.7,
38
+ ),
39
+ "qwen/qwen-2.5-32b-instruct": ModelConfig(
40
+ name="Qwen 2.5 32B Instruct",
41
+ category="text-generation",
42
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-32B-Instruct",
43
+ max_tokens=8192,
44
+ ),
45
+ "qwen/qwen-2.5-14b-instruct": ModelConfig(
46
+ name="Qwen 2.5 14B Instruct",
47
+ category="text-generation",
48
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-14B-Instruct",
49
+ max_tokens=8192,
50
+ ),
51
+ "qwen/qwen-2.5-7b-instruct": ModelConfig(
52
+ name="Qwen 2.5 7B Instruct",
53
+ category="text-generation",
54
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-7B-Instruct",
55
+ ),
56
+ "qwen/qwen-2.5-3b-instruct": ModelConfig(
57
+ name="Qwen 2.5 3B Instruct",
58
+ category="text-generation",
59
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-3B-Instruct",
60
+ ),
61
+ "qwen/qwen-2.5-1.5b-instruct": ModelConfig(
62
+ name="Qwen 2.5 1.5B Instruct",
63
+ category="text-generation",
64
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-1.5B-Instruct",
65
+ ),
66
+ "qwen/qwen-2.5-0.5b-instruct": ModelConfig(
67
+ name="Qwen 2.5 0.5B Instruct",
68
+ category="text-generation",
69
+ api_endpoint="https://api-inference.huggingface.co/models/Qwen/Qwen2.5-0.5B-Instruct",
70
+ ),
71
+ # ... (Add all 35 Qwen models)
72
+ # DeepSeek Models (17 models)
73
+ "deepseek-ai/deepseek-coder-33b-instruct": ModelConfig(
74
+ name="DeepSeek Coder 33B Instruct",
75
+ category="code-generation",
76
+ api_endpoint="https://api-inference.huggingface.co/models/deepseek-ai/deepseek-coder-33b-instruct",
77
+ max_tokens=8192,
78
+ special_parameters={"code_focused": True},
79
+ ),
80
+ "deepseek-ai/deepseek-coder-6.7b-instruct": ModelConfig(
81
+ name="DeepSeek Coder 6.7B Instruct",
82
+ category="code-generation",
83
+ api_endpoint="https://api-inference.huggingface.co/models/deepseek-ai/deepseek-coder-6.7b-instruct",
84
+ ),
85
+ # ... (Add all 17 DeepSeek models)
86
+ }
87
+
88
+ # Image Editing Models (10 models)
89
+ IMAGE_EDITING_MODELS = {
90
+ "stabilityai/stable-diffusion-xl-refiner-1.0": ModelConfig(
91
+ name="SDXL Refiner 1.0",
92
+ category="image-editing",
93
+ api_endpoint="https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-xl-refiner-1.0",
94
+ supported_formats=["image/png", "image/jpeg"],
95
+ ),
96
+ "runwayml/stable-diffusion-inpainting": ModelConfig(
97
+ name="Stable Diffusion Inpainting",
98
+ category="image-inpainting",
99
+ api_endpoint="https://api-inference.huggingface.co/models/runwayml/stable-diffusion-inpainting",
100
+ supported_formats=["image/png", "image/jpeg"],
101
+ ),
102
+ # ... (Add all 10 image editing models)
103
+ }
104
+
105
+ # TTS/STT Models (15 models)
106
+ SPEECH_MODELS = {
107
+ "microsoft/speecht5_tts": ModelConfig(
108
+ name="SpeechT5 TTS",
109
+ category="text-to-speech",
110
+ api_endpoint="https://api-inference.huggingface.co/models/microsoft/speecht5_tts",
111
+ supported_formats=["audio/wav", "audio/mp3"],
112
+ ),
113
+ "openai/whisper-large-v3": ModelConfig(
114
+ name="Whisper Large v3",
115
+ category="automatic-speech-recognition",
116
+ api_endpoint="https://api-inference.huggingface.co/models/openai/whisper-large-v3",
117
+ supported_formats=["audio/wav", "audio/mp3", "audio/flac"],
118
+ ),
119
+ # ... (Add all 15 speech models)
120
+ }
121
+
122
+ # Face Swap Models (6 models)
123
+ FACE_SWAP_MODELS = {
124
+ "deepinsight/insightface": ModelConfig(
125
+ name="InsightFace",
126
+ category="face-swap",
127
+ api_endpoint="https://api-inference.huggingface.co/models/deepinsight/insightface",
128
+ supported_formats=["image/png", "image/jpeg"],
129
+ ),
130
+ # ... (Add all 6 face swap models)
131
+ }
132
+
133
+ # Talking Avatar Models (9 models)
134
+ AVATAR_MODELS = {
135
+ "microsoft/DiT-XL-2-512": ModelConfig(
136
+ name="DiT Avatar Generator",
137
+ category="talking-avatar",
138
+ api_endpoint="https://api-inference.huggingface.co/models/microsoft/DiT-XL-2-512",
139
+ supported_formats=["video/mp4", "image/png"],
140
+ ),
141
+ # ... (Add all 9 avatar models)
142
+ }
143
+
144
+ # Arabic-English Interactive Models (12 models)
145
+ ARABIC_ENGLISH_MODELS = {
146
+ "aubmindlab/bert-base-arabertv02": ModelConfig(
147
+ name="AraBERT v02",
148
+ category="arabic-text",
149
+ api_endpoint="https://api-inference.huggingface.co/models/aubmindlab/bert-base-arabertv02",
150
+ special_parameters={"language": "ar-en"},
151
+ ),
152
+ "UBC-NLP/MARBERT": ModelConfig(
153
+ name="MARBERT",
154
+ category="arabic-text",
155
+ api_endpoint="https://api-inference.huggingface.co/models/UBC-NLP/MARBERT",
156
+ special_parameters={"language": "ar-en"},
157
+ ),
158
+ # ... (Add all 12 Arabic-English models)
159
+ }
160
+
161
+
162
+ class AgentConfig:
163
+ """Configuration for AI Agents"""
164
+
165
+ # Manus Agent Configuration
166
+ MANUS_AGENT = {
167
+ "name": "Manus",
168
+ "description": "Versatile AI agent with 200+ models",
169
+ "max_steps": 20,
170
+ "max_observe": 10000,
171
+ "system_prompt_template": """You are Manus, an advanced AI agent with access to 200+ specialized models.
172
+
173
+ Available categories:
174
+ - Text Generation (Qwen, DeepSeek, etc.)
175
+ - Image Editing & Generation
176
+ - Speech (TTS/STT)
177
+ - Face Swap & Avatar Generation
178
+ - Arabic-English Interactive Models
179
+ - Code Generation & Review
180
+ - Multimodal AI
181
+ - Document Processing
182
+ - 3D Generation
183
+ - Video Processing
184
+
185
+ User workspace: {directory}""",
186
+ "tools": [
187
+ "PythonExecute",
188
+ "BrowserUseTool",
189
+ "StrReplaceEditor",
190
+ "AskHuman",
191
+ "Terminate",
192
+ "HuggingFaceModels",
193
+ ],
194
+ "model_preferences": {
195
+ "text": "qwen/qwen-2.5-72b-instruct",
196
+ "code": "deepseek-ai/deepseek-coder-33b-instruct",
197
+ "image": "stabilityai/stable-diffusion-xl-refiner-1.0",
198
+ "speech": "microsoft/speecht5_tts",
199
+ "arabic": "aubmindlab/bert-base-arabertv02",
200
+ },
201
+ }
202
+
203
+
204
+ class ServiceConfig:
205
+ """Configuration for all services"""
206
+
207
+ # Cloudflare Services
208
+ CLOUDFLARE_CONFIG = {
209
+ "d1_database": {
210
+ "enabled": True,
211
+ "tables": ["users", "sessions", "agent_interactions", "model_usage"],
212
+ "auto_migrate": True,
213
+ },
214
+ "r2_storage": {
215
+ "enabled": True,
216
+ "buckets": ["user-files", "generated-content", "model-cache"],
217
+ "max_file_size": "100MB",
218
+ },
219
+ "kv_storage": {
220
+ "enabled": True,
221
+ "namespaces": ["sessions", "model-cache", "user-preferences"],
222
+ "ttl": 86400, # 24 hours
223
+ },
224
+ "durable_objects": {
225
+ "enabled": True,
226
+ "classes": ["ChatSession", "ModelRouter", "UserContext"],
227
+ },
228
+ }
229
+
230
+ # Authentication Configuration
231
+ AUTH_CONFIG = {
232
+ "method": "mobile_password",
233
+ "password_min_length": 8,
234
+ "session_duration": 86400, # 24 hours
235
+ "max_concurrent_sessions": 5,
236
+ "mobile_validation": {
237
+ "international": True,
238
+ "formats": ["+1234567890", "01234567890"],
239
+ },
240
+ }
241
+
242
+ # Model Usage Configuration
243
+ MODEL_CONFIG = {
244
+ "rate_limits": {
245
+ "free_tier": 100, # requests per day
246
+ "premium_tier": 1000,
247
+ "enterprise_tier": 10000,
248
+ },
249
+ "fallback_models": {
250
+ "text": ["qwen/qwen-2.5-7b-instruct", "qwen/qwen-2.5-3b-instruct"],
251
+ "image": ["runwayml/stable-diffusion-v1-5"],
252
+ "code": ["deepseek-ai/deepseek-coder-6.7b-instruct"],
253
+ },
254
+ "cache_settings": {"enabled": True, "ttl": 3600, "max_size": "1GB"}, # 1 hour
255
+ }
256
+
257
+
258
+ class EnvironmentConfig:
259
+ """Environment-specific configurations"""
260
+
261
+ @staticmethod
262
+ def get_production_config():
263
+ """Get production environment configuration"""
264
+ return {
265
+ "environment": "production",
266
+ "debug": False,
267
+ "log_level": "INFO",
268
+ "server": {"host": "0.0.0.0", "port": 7860, "workers": 4},
269
+ "database": {"type": "sqlite", "url": "auth.db", "pool_size": 10},
270
+ "security": {
271
+ "secret_key": os.getenv("SECRET_KEY", "your-secret-key"),
272
+ "cors_origins": ["*"],
273
+ "rate_limiting": True,
274
+ },
275
+ "monitoring": {"metrics": True, "logging": True, "health_checks": True},
276
+ }
277
+
278
+ @staticmethod
279
+ def get_development_config():
280
+ """Get development environment configuration"""
281
+ return {
282
+ "environment": "development",
283
+ "debug": True,
284
+ "log_level": "DEBUG",
285
+ "server": {"host": "127.0.0.1", "port": 7860, "workers": 1},
286
+ "database": {"type": "sqlite", "url": "auth_dev.db", "pool_size": 2},
287
+ "security": {
288
+ "secret_key": "dev-secret-key",
289
+ "cors_origins": ["http://localhost:*"],
290
+ "rate_limiting": False,
291
+ },
292
+ }
293
+
294
+
295
+ # Global configuration instance
296
+ class OpenManusConfig:
297
+ """Main configuration class for OpenManus"""
298
+
299
+ def __init__(self, environment: str = "production"):
300
+ self.environment = environment
301
+ self.categories = CategoryConfig()
302
+ self.agent = AgentConfig()
303
+ self.services = ServiceConfig()
304
+
305
+ if environment == "production":
306
+ self.env_config = EnvironmentConfig.get_production_config()
307
+ else:
308
+ self.env_config = EnvironmentConfig.get_development_config()
309
+
310
+ def get_model_config(self, model_id: str) -> Optional[ModelConfig]:
311
+ """Get configuration for a specific model"""
312
+ all_models = {
313
+ **self.categories.TEXT_GENERATION_MODELS,
314
+ **self.categories.IMAGE_EDITING_MODELS,
315
+ **self.categories.SPEECH_MODELS,
316
+ **self.categories.FACE_SWAP_MODELS,
317
+ **self.categories.AVATAR_MODELS,
318
+ **self.categories.ARABIC_ENGLISH_MODELS,
319
+ }
320
+ return all_models.get(model_id)
321
+
322
+ def get_category_models(self, category: str) -> Dict[str, ModelConfig]:
323
+ """Get all models in a category"""
324
+ if category == "text-generation":
325
+ return self.categories.TEXT_GENERATION_MODELS
326
+ elif category == "image-editing":
327
+ return self.categories.IMAGE_EDITING_MODELS
328
+ elif category in ["text-to-speech", "automatic-speech-recognition"]:
329
+ return self.categories.SPEECH_MODELS
330
+ elif category == "face-swap":
331
+ return self.categories.FACE_SWAP_MODELS
332
+ elif category == "talking-avatar":
333
+ return self.categories.AVATAR_MODELS
334
+ elif category == "arabic-text":
335
+ return self.categories.ARABIC_ENGLISH_MODELS
336
+ else:
337
+ return {}
338
+
339
+ def validate_config(self) -> bool:
340
+ """Validate the configuration"""
341
+ try:
342
+ # Check required environment variables
343
+ required_env = (
344
+ ["CLOUDFLARE_API_TOKEN", "HF_TOKEN"]
345
+ if self.environment == "production"
346
+ else []
347
+ )
348
+ missing_env = [var for var in required_env if not os.getenv(var)]
349
+
350
+ if missing_env:
351
+ print(f"Missing required environment variables: {missing_env}")
352
+ return False
353
+
354
+ print(f"Configuration validated for {self.environment} environment")
355
+ return True
356
+
357
+ except Exception as e:
358
+ print(f"Configuration validation failed: {e}")
359
+ return False
360
+
361
+
362
+ # Create global config instance
363
+ config = OpenManusConfig(environment=os.getenv("ENVIRONMENT", "production"))
app/prompt/__init__.py ADDED
File without changes
app/prompt/browser.py ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = """\
2
+ You are an AI agent designed to automate browser tasks. Your goal is to accomplish the ultimate task following the rules.
3
+
4
+ # Input Format
5
+ Task
6
+ Previous steps
7
+ Current URL
8
+ Open Tabs
9
+ Interactive Elements
10
+ [index]<type>text</type>
11
+ - index: Numeric identifier for interaction
12
+ - type: HTML element type (button, input, etc.)
13
+ - text: Element description
14
+ Example:
15
+ [33]<button>Submit Form</button>
16
+
17
+ - Only elements with numeric indexes in [] are interactive
18
+ - elements without [] provide only context
19
+
20
+ # Response Rules
21
+ 1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
22
+ {{"current_state": {{"evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Mention if something unexpected happened. Shortly state why/why not",
23
+ "memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
24
+ "next_goal": "What needs to be done with the next immediate action"}},
25
+ "action":[{{"one_action_name": {{// action-specific parameter}}}}, // ... more actions in sequence]}}
26
+
27
+ 2. ACTIONS: You can specify multiple actions in the list to be executed in sequence. But always specify only one action name per item. Use maximum {{max_actions}} actions per sequence.
28
+ Common action sequences:
29
+ - Form filling: [{{"input_text": {{"index": 1, "text": "username"}}}}, {{"input_text": {{"index": 2, "text": "password"}}}}, {{"click_element": {{"index": 3}}}}]
30
+ - Navigation and extraction: [{{"go_to_url": {{"url": "https://example.com"}}}}, {{"extract_content": {{"goal": "extract the names"}}}}]
31
+ - Actions are executed in the given order
32
+ - If the page changes after an action, the sequence is interrupted and you get the new state.
33
+ - Only provide the action sequence until an action which changes the page state significantly.
34
+ - Try to be efficient, e.g. fill forms at once, or chain actions where nothing changes on the page
35
+ - only use multiple actions if it makes sense.
36
+
37
+ 3. ELEMENT INTERACTION:
38
+ - Only use indexes of the interactive elements
39
+ - Elements marked with "[]Non-interactive text" are non-interactive
40
+
41
+ 4. NAVIGATION & ERROR HANDLING:
42
+ - If no suitable elements exist, use other functions to complete the task
43
+ - If stuck, try alternative approaches - like going back to a previous page, new search, new tab etc.
44
+ - Handle popups/cookies by accepting or closing them
45
+ - Use scroll to find elements you are looking for
46
+ - If you want to research something, open a new tab instead of using the current tab
47
+ - If captcha pops up, try to solve it - else try a different approach
48
+ - If the page is not fully loaded, use wait action
49
+
50
+ 5. TASK COMPLETION:
51
+ - Use the done action as the last action as soon as the ultimate task is complete
52
+ - Dont use "done" before you are done with everything the user asked you, except you reach the last step of max_steps.
53
+ - If you reach your last step, use the done action even if the task is not fully finished. Provide all the information you have gathered so far. If the ultimate task is completly finished set success to true. If not everything the user asked for is completed set success in done to false!
54
+ - If you have to do something repeatedly for example the task says for "each", or "for all", or "x times", count always inside "memory" how many times you have done it and how many remain. Don't stop until you have completed like the task asked you. Only call done after the last step.
55
+ - Don't hallucinate actions
56
+ - Make sure you include everything you found out for the ultimate task in the done text parameter. Do not just say you are done, but include the requested information of the task.
57
+
58
+ 6. VISUAL CONTEXT:
59
+ - When an image is provided, use it to understand the page layout
60
+ - Bounding boxes with labels on their top right corner correspond to element indexes
61
+
62
+ 7. Form filling:
63
+ - If you fill an input field and your action sequence is interrupted, most often something changed e.g. suggestions popped up under the field.
64
+
65
+ 8. Long tasks:
66
+ - Keep track of the status and subresults in the memory.
67
+
68
+ 9. Extraction:
69
+ - If your task is to find information - call extract_content on the specific pages to get and store the information.
70
+ Your responses must be always JSON with the specified format.
71
+ """
72
+
73
+ NEXT_STEP_PROMPT = """
74
+ What should I do next to achieve my goal?
75
+
76
+ When you see [Current state starts here], focus on the following:
77
+ - Current URL and page title{url_placeholder}
78
+ - Available tabs{tabs_placeholder}
79
+ - Interactive elements and their indices
80
+ - Content above{content_above_placeholder} or below{content_below_placeholder} the viewport (if indicated)
81
+ - Any action results or errors{results_placeholder}
82
+
83
+ For browser interactions:
84
+ - To navigate: browser_use with action="go_to_url", url="..."
85
+ - To click: browser_use with action="click_element", index=N
86
+ - To type: browser_use with action="input_text", index=N, text="..."
87
+ - To extract: browser_use with action="extract_content", goal="..."
88
+ - To scroll: browser_use with action="scroll_down" or "scroll_up"
89
+
90
+ Consider both what's visible and what might be beyond the current viewport.
91
+ Be methodical - remember your progress and what you've learned so far.
92
+
93
+ If you want to stop the interaction at any point, use the `terminate` tool/function call.
94
+ """
app/prompt/manus.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = (
2
+ "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, web browsing, or human interaction (only for extreme cases), you can handle it all."
3
+ "The initial directory is: {directory}"
4
+ )
5
+
6
+ NEXT_STEP_PROMPT = """
7
+ Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.
8
+
9
+ If you want to stop the interaction at any point, use the `terminate` tool/function call.
10
+ """
app/prompt/mcp.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Prompts for the MCP Agent."""
2
+
3
+ SYSTEM_PROMPT = """You are an AI assistant with access to a Model Context Protocol (MCP) server.
4
+ You can use the tools provided by the MCP server to complete tasks.
5
+ The MCP server will dynamically expose tools that you can use - always check the available tools first.
6
+
7
+ When using an MCP tool:
8
+ 1. Choose the appropriate tool based on your task requirements
9
+ 2. Provide properly formatted arguments as required by the tool
10
+ 3. Observe the results and use them to determine next steps
11
+ 4. Tools may change during operation - new tools might appear or existing ones might disappear
12
+
13
+ Follow these guidelines:
14
+ - Call tools with valid parameters as documented in their schemas
15
+ - Handle errors gracefully by understanding what went wrong and trying again with corrected parameters
16
+ - For multimedia responses (like images), you'll receive a description of the content
17
+ - Complete user requests step by step, using the most appropriate tools
18
+ - If multiple tools need to be called in sequence, make one call at a time and wait for results
19
+
20
+ Remember to clearly explain your reasoning and actions to the user.
21
+ """
22
+
23
+ NEXT_STEP_PROMPT = """Based on the current state and available tools, what should be done next?
24
+ Think step by step about the problem and identify which MCP tool would be most helpful for the current stage.
25
+ If you've already made progress, consider what additional information you need or what actions would move you closer to completing the task.
26
+ """
27
+
28
+ # Additional specialized prompts
29
+ TOOL_ERROR_PROMPT = """You encountered an error with the tool '{tool_name}'.
30
+ Try to understand what went wrong and correct your approach.
31
+ Common issues include:
32
+ - Missing or incorrect parameters
33
+ - Invalid parameter formats
34
+ - Using a tool that's no longer available
35
+ - Attempting an operation that's not supported
36
+
37
+ Please check the tool specifications and try again with corrected parameters.
38
+ """
39
+
40
+ MULTIMEDIA_RESPONSE_PROMPT = """You've received a multimedia response (image, audio, etc.) from the tool '{tool_name}'.
41
+ This content has been processed and described for you.
42
+ Use this information to continue the task or provide insights to the user.
43
+ """
app/prompt/planning.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ PLANNING_SYSTEM_PROMPT = """
2
+ You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
3
+ Your job is:
4
+ 1. Analyze requests to understand the task scope
5
+ 2. Create a clear, actionable plan that makes meaningful progress with the `planning` tool
6
+ 3. Execute steps using available tools as needed
7
+ 4. Track progress and adapt plans when necessary
8
+ 5. Use `finish` to conclude immediately when the task is complete
9
+
10
+
11
+ Available tools will vary by task but may include:
12
+ - `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
13
+ - `finish`: End the task when complete
14
+ Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps.
15
+ Think about dependencies and verification methods.
16
+ Know when to conclude - don't continue thinking once objectives are met.
17
+ """
18
+
19
+ NEXT_STEP_PROMPT = """
20
+ Based on the current state, what's your next action?
21
+ Choose the most efficient path forward:
22
+ 1. Is the plan sufficient, or does it need refinement?
23
+ 2. Can you execute the next step immediately?
24
+ 3. Is the task complete? If so, use `finish` right away.
25
+
26
+ Be concise in your reasoning, then select the appropriate tool or action.
27
+ """
app/prompt/swe.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = """SETTING: You are an autonomous programmer, and you're working directly in the command line with a special interface.
2
+
3
+ The special interface consists of a file editor that shows you {{WINDOW}} lines of a file at a time.
4
+ In addition to typical bash commands, you can also use specific commands to help you navigate and edit files.
5
+ To call a command, you need to invoke it with a function call/tool call.
6
+
7
+ Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION.
8
+ If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
9
+
10
+ RESPONSE FORMAT:
11
+ Your shell prompt is formatted as follows:
12
+ (Open file: <path>)
13
+ (Current directory: <cwd>)
14
+ bash-$
15
+
16
+ First, you should _always_ include a general thought about what you're going to do next.
17
+ Then, for every response, you must include exactly _ONE_ tool call/function call.
18
+
19
+ Remember, you should always include a _SINGLE_ tool call/function call and then wait for a response from the shell before continuing with more discussion and commands. Everything you include in the DISCUSSION section will be saved for future reference.
20
+ If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first tool call, and then after receiving a response you'll be able to issue the second tool call.
21
+ Note that the environment does NOT support interactive session commands (e.g. python, vim), so please do not invoke them.
22
+ """
app/prompt/toolcall.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = "You are an agent that can execute tool calls"
2
+
3
+ NEXT_STEP_PROMPT = (
4
+ "If you want to stop interaction, use `terminate` tool/function call."
5
+ )
app/prompt/visualization.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = """You are an AI agent designed to data analysis / visualization task. You have various tools at your disposal that you can call upon to efficiently complete complex requests.
2
+ # Note:
3
+ 1. The workspace directory is: {directory}; Read / write file in workspace
4
+ 2. Generate analysis conclusion report in the end"""
5
+
6
+ NEXT_STEP_PROMPT = """Based on user needs, break down the problem and use different tools step by step to solve it.
7
+ # Note
8
+ 1. Each step select the most appropriate tool proactively (ONLY ONE).
9
+ 2. After using each tool, clearly explain the execution results and suggest the next steps.
10
+ 3. When observation with Error, review and fix it."""