ishandutta commited on
Commit
ba9f34e
Β·
verified Β·
1 Parent(s): 4335697

Create app.py

Browse files
Files changed (1) hide show
  1. app.py +822 -0
app.py ADDED
@@ -0,0 +1,822 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Smart Product Cataloger - Gradio App
3
+ Multimodal AI for E-commerce Product Analysis
4
+
5
+ Google Colab - https://colab.research.google.com/drive/1eFNaidx5TPEhXgzdY9hh7EhDcVZm4GMS?usp=sharing
6
+
7
+ This app analyzes product images and generates metadata for e-commerce
8
+ listings using CLIP and BLIP models.
9
+
10
+ OVERVIEW:
11
+ ---------
12
+ This application combines zero-shot classification with visual question answering
13
+ to analyze product images for e-commerce. It uses:
14
+ - CLIP for zero-shot product category classification
15
+ - BLIP for image captioning and product description generation
16
+ - BLIP VQA for answering specific questions about product attributes
17
+
18
+ FEATURES:
19
+ ---------
20
+ 1. AI-powered product category classification using CLIP
21
+ 2. Automatic product description generation using BLIP
22
+ 3. Category-specific attribute extraction via visual Q&A
23
+ 4. Upload and analyze your own product images
24
+ 5. Professional e-commerce metadata generation
25
+
26
+ MODELS USED:
27
+ ------------
28
+ - openai/clip-vit-base-patch32: Zero-shot image classification
29
+ - Salesforce/blip-image-captioning-base: Image captioning
30
+ - Salesforce/blip-vqa-base: Visual question answering
31
+
32
+ REQUIREMENTS:
33
+ -------------
34
+ - torch
35
+ - gradio
36
+ - transformers
37
+ - PIL (Pillow)
38
+ - requests
39
+
40
+ HOW TO RUN:
41
+ -----------
42
+ 1. Install dependencies:
43
+ pip install torch gradio transformers pillow requests
44
+
45
+ 2. Run the application:
46
+ python product_cataloger_app.py
47
+
48
+ 3. Open your browser and navigate to:
49
+ http://localhost:7860
50
+
51
+ 4. Follow the app instructions:
52
+ - Click "Load Models" first (required)
53
+ - Upload product images or use sample URLs
54
+ - Get automatic category classification and metadata
55
+
56
+ USAGE EXAMPLES:
57
+ ---------------
58
+ Product Categories Supported:
59
+ - "clothing" - shirts, dresses, pants, etc.
60
+ - "shoes" - sneakers, boots, dress shoes, etc.
61
+ - "electronics" - phones, laptops, gadgets, etc.
62
+ - "furniture" - chairs, tables, sofas, etc.
63
+ - "books" - novels, textbooks, magazines, etc.
64
+ - "toys" - games, dolls, educational toys, etc.
65
+
66
+ The app will automatically classify products and generate relevant
67
+ e-commerce metadata including descriptions and category-specific attributes.
68
+ """
69
+
70
+ import warnings
71
+ from typing import Dict, List, Union
72
+
73
+ import gradio as gr
74
+ import requests
75
+ import torch
76
+ from PIL import Image
77
+ from transformers import (
78
+ BlipForConditionalGeneration,
79
+ BlipForQuestionAnswering,
80
+ BlipProcessor,
81
+ CLIPModel,
82
+ CLIPProcessor,
83
+ pipeline,
84
+ )
85
+
86
+ # Suppress warnings for cleaner output
87
+ warnings.filterwarnings("ignore")
88
+
89
+
90
+ class SmartProductCataloger:
91
+ """
92
+ Main class for analyzing product images and generating e-commerce metadata.
93
+
94
+ This class integrates CLIP for classification and BLIP for captioning/VQA
95
+ to create a complete product analysis pipeline for e-commerce applications.
96
+
97
+ Attributes:
98
+ device (str): Computing device ('cuda', 'mps', or 'cpu')
99
+ dtype (torch.dtype): Data type for model optimization
100
+ clip_model: CLIP model for zero-shot classification
101
+ clip_processor: CLIP processor for input preprocessing
102
+ blip_caption_model: BLIP model for image captioning
103
+ blip_caption_processor: BLIP processor for captioning
104
+ blip_vqa_model: BLIP model for visual question answering
105
+ blip_vqa_processor: BLIP processor for VQA
106
+ models_loaded (bool): Flag to track if models are loaded
107
+ """
108
+
109
+ def __init__(self):
110
+ """Initialize the SmartProductCataloger with device setup and model placeholders."""
111
+ # Automatically detect the best available device for AI computation
112
+ self.device, self.dtype = self.setup_device()
113
+
114
+ # Initialize model placeholders - models loaded separately for better UX
115
+ self.clip_model = None # CLIP classification model
116
+ self.clip_processor = None # CLIP input processor
117
+ self.blip_caption_model = None # BLIP captioning model
118
+ self.blip_caption_processor = None # BLIP captioning processor
119
+ self.blip_vqa_model = None # BLIP VQA model
120
+ self.blip_vqa_processor = None # BLIP VQA processor
121
+ self.models_loaded = False # Track model loading status
122
+
123
+ def setup_device(self):
124
+ """
125
+ Setup the optimal computing device and data type for AI models.
126
+
127
+ Priority order: CUDA GPU > Apple Silicon MPS > CPU
128
+ Uses float16 for CUDA (memory efficiency) and float32 for others (stability).
129
+
130
+ Returns:
131
+ tuple: (device_name, torch_dtype) for model optimization
132
+ """
133
+ if torch.cuda.is_available():
134
+ # NVIDIA GPU available - use CUDA with float16 for memory efficiency
135
+ return "cuda", torch.float16
136
+ elif torch.backends.mps.is_available():
137
+ # Apple Silicon Mac - use Metal Performance Shaders with float32
138
+ return "mps", torch.float32
139
+ else:
140
+ # Fallback to CPU with float32 for compatibility
141
+ return "cpu", torch.float32
142
+
143
+ def load_models(self):
144
+ """
145
+ Load all required AI models for product analysis.
146
+
147
+ Downloads and initializes:
148
+ 1. CLIP for zero-shot product classification
149
+ 2. BLIP for image captioning and product descriptions
150
+ 3. BLIP VQA for answering specific product attribute questions
151
+
152
+ Returns:
153
+ str: Status message indicating success or failure
154
+ """
155
+ # Check if models are already loaded to avoid redundant loading
156
+ if self.models_loaded:
157
+ return "βœ… Models already loaded!"
158
+
159
+ try:
160
+ print("πŸ“¦ Loading models...")
161
+
162
+ # Load CLIP model for zero-shot product classification
163
+ # Model: openai/clip-vit-base-patch32 (versatile, well-trained model)
164
+ print("πŸ“¦ Loading CLIP model...")
165
+ self.clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
166
+ self.clip_processor = CLIPProcessor.from_pretrained(
167
+ "openai/clip-vit-base-patch32"
168
+ )
169
+
170
+ # Load BLIP model for image captioning and product descriptions
171
+ # Model: Salesforce/blip-image-captioning-base (specialized for descriptions)
172
+ print("πŸ“¦ Loading BLIP caption model...")
173
+ self.blip_caption_model = BlipForConditionalGeneration.from_pretrained(
174
+ "Salesforce/blip-image-captioning-base"
175
+ )
176
+ self.blip_caption_processor = BlipProcessor.from_pretrained(
177
+ "Salesforce/blip-image-captioning-base"
178
+ )
179
+
180
+ # Load BLIP VQA model for answering specific product questions
181
+ # Model: Salesforce/blip-vqa-base (specialized for visual Q&A)
182
+ print("πŸ“¦ Loading BLIP VQA model...")
183
+ self.blip_vqa_model = BlipForQuestionAnswering.from_pretrained(
184
+ "Salesforce/blip-vqa-base"
185
+ )
186
+ self.blip_vqa_processor = BlipProcessor.from_pretrained(
187
+ "Salesforce/blip-vqa-base"
188
+ )
189
+
190
+ # Set models to evaluation mode for inference (disables dropout, etc.)
191
+ self.blip_caption_model.eval()
192
+ self.blip_vqa_model.eval()
193
+
194
+ # Mark models as successfully loaded
195
+ self.models_loaded = True
196
+ return "βœ… All models loaded successfully!"
197
+
198
+ except Exception as e:
199
+ return f"❌ Error loading models: {str(e)}"
200
+
201
+ def load_image_from_url(self, url: str):
202
+ """
203
+ Load an image from a URL with error handling.
204
+
205
+ Args:
206
+ url (str): URL of the image to load
207
+
208
+ Returns:
209
+ PIL.Image or None: Loaded image in RGB format, or None if failed
210
+ """
211
+ try:
212
+ # Use requests to fetch image data with streaming for efficiency
213
+ response = requests.get(url, stream=True)
214
+ response.raise_for_status() # Raise exception for bad status codes
215
+
216
+ # Create PIL Image from response and ensure RGB format
217
+ image = Image.open(response.raw).convert("RGB")
218
+ return image
219
+
220
+ except Exception as e:
221
+ print(f"❌ Error loading image: {e}")
222
+ return None
223
+
224
+ def classify_product_image(self, image: Image.Image, candidate_labels: List[str]):
225
+ """
226
+ Classify product image using CLIP zero-shot classification.
227
+
228
+ Args:
229
+ image (PIL.Image): Product image to classify
230
+ candidate_labels (List[str]): List of possible product categories
231
+
232
+ Returns:
233
+ List[Dict]: Classification results with labels and confidence scores
234
+ """
235
+ if not self.models_loaded:
236
+ return [{"label": "error", "score": 0.0}]
237
+
238
+ try:
239
+ print("πŸ” Classifying product category...")
240
+
241
+ # Use our already-loaded CLIP model directly instead of pipeline
242
+ # Process image and text labels through CLIP processor
243
+ inputs = self.clip_processor(
244
+ text=candidate_labels, # List of category labels
245
+ images=image, # PIL Image
246
+ return_tensors="pt", # Return PyTorch tensors
247
+ padding=True, # Pad text inputs to same length
248
+ )
249
+
250
+ # Get predictions from CLIP model
251
+ with torch.no_grad(): # Disable gradients for inference
252
+ outputs = self.clip_model(**inputs)
253
+
254
+ # Calculate probabilities using softmax on logits
255
+ logits_per_image = (
256
+ outputs.logits_per_image
257
+ ) # Image-text similarity scores
258
+ probs = torch.softmax(
259
+ logits_per_image, dim=-1
260
+ ) # Convert to probabilities
261
+
262
+ # Format results to match pipeline output format
263
+ results = []
264
+ for i, label in enumerate(candidate_labels):
265
+ results.append(
266
+ {
267
+ "label": label,
268
+ "score": float(probs[0][i]), # Convert tensor to float
269
+ }
270
+ )
271
+
272
+ # Sort by confidence score (highest first) to match pipeline behavior
273
+ results.sort(key=lambda x: x["score"], reverse=True)
274
+
275
+ return results
276
+
277
+ except Exception as e:
278
+ print(f"❌ Classification error: {e}")
279
+ return [{"label": "error", "score": 0.0}]
280
+
281
+ def generate_product_caption(self, image: Image.Image):
282
+ """
283
+ Generate descriptive caption for product image using BLIP.
284
+
285
+ Args:
286
+ image (PIL.Image): Product image to describe
287
+
288
+ Returns:
289
+ str: Generated product description
290
+ """
291
+ if not self.models_loaded:
292
+ return "❌ Models not loaded."
293
+
294
+ try:
295
+ print("πŸ“ Generating product description...")
296
+
297
+ # Process image through BLIP captioning processor
298
+ inputs = self.blip_caption_processor(image, return_tensors="pt")
299
+
300
+ # Generate caption using BLIP model with beam search for quality
301
+ with torch.no_grad(): # Disable gradients for inference efficiency
302
+ out = self.blip_caption_model.generate(
303
+ **inputs,
304
+ max_length=50, # Maximum description length
305
+ num_beams=5, # Beam search for better quality
306
+ early_stopping=True, # Stop when end token is generated
307
+ )
308
+
309
+ # Decode generated tokens back to readable text
310
+ caption = self.blip_caption_processor.decode(
311
+ out[0], skip_special_tokens=True
312
+ )
313
+ return caption
314
+
315
+ except Exception as e:
316
+ return f"❌ Error generating caption: {str(e)}"
317
+
318
+ def ask_about_product(self, image: Image.Image, question: str):
319
+ """
320
+ Answer specific questions about product using BLIP Visual Question Answering.
321
+
322
+ Args:
323
+ image (PIL.Image): Product image to analyze
324
+ question (str): Question to ask about the product
325
+
326
+ Returns:
327
+ str: Answer to the question or error message
328
+ """
329
+ if not self.models_loaded:
330
+ return "❌ Models not loaded."
331
+
332
+ try:
333
+ # Process both image and question together through BLIP VQA processor
334
+ inputs = self.blip_vqa_processor(image, question, return_tensors="pt")
335
+
336
+ # Generate answer using BLIP VQA model
337
+ with torch.no_grad(): # Disable gradients for inference
338
+ out = self.blip_vqa_model.generate(
339
+ **inputs,
340
+ max_length=20, # Answers are typically short
341
+ num_beams=5, # Beam search for better quality
342
+ early_stopping=True, # Stop when end token is generated
343
+ )
344
+
345
+ # Decode generated tokens to get the final answer
346
+ answer = self.blip_vqa_processor.decode(out[0], skip_special_tokens=True)
347
+ return answer.strip() # Remove extra whitespace
348
+
349
+ except Exception as e:
350
+ return f"❌ Error: {str(e)}"
351
+
352
+ def get_category_questions(self, category: str):
353
+ """
354
+ Get relevant questions for specific product categories.
355
+
356
+ Each category has tailored questions to extract the most useful
357
+ e-commerce metadata and product attributes.
358
+
359
+ Args:
360
+ category (str): Product category name
361
+
362
+ Returns:
363
+ List[str]: List of relevant questions for the category
364
+ """
365
+ # Comprehensive mapping of categories to relevant e-commerce questions
366
+ question_map = {
367
+ "shoes": [
368
+ "What color are these shoes?",
369
+ "What type of shoes are these?",
370
+ "What brand are these shoes?",
371
+ "What material are these shoes made of?",
372
+ "Are these sneakers?",
373
+ ],
374
+ "clothing": [
375
+ "What color is this clothing?",
376
+ "What type of clothing is this?",
377
+ "What material is this clothing made of?",
378
+ "What size is this clothing?",
379
+ "Is this formal or casual wear?",
380
+ ],
381
+ "electronics": [
382
+ "What type of device is this?",
383
+ "What brand is this device?",
384
+ "What color is this device?",
385
+ "Is this a smartphone or tablet?",
386
+ "Does this have a screen?",
387
+ ],
388
+ "furniture": [
389
+ "What type of furniture is this?",
390
+ "What color is this furniture?",
391
+ "What material is this furniture made of?",
392
+ "Is this indoor or outdoor furniture?",
393
+ "How many people can use this?",
394
+ ],
395
+ "books": [
396
+ "What type of book is this?",
397
+ "What color is the book cover?",
398
+ "Is this a hardcover or paperback?",
399
+ "Does this book have text on the cover?",
400
+ "Is this a fiction or non-fiction book?",
401
+ ],
402
+ "toys": [
403
+ "What type of toy is this?",
404
+ "What color is this toy?",
405
+ "Is this toy for children or adults?",
406
+ "What material is this toy made of?",
407
+ "Is this an educational toy?",
408
+ ],
409
+ }
410
+
411
+ # Return category-specific questions or default generic questions
412
+ return question_map.get(
413
+ category,
414
+ [
415
+ "What color is this?",
416
+ "What type of item is this?",
417
+ "What is this made of?",
418
+ ],
419
+ )
420
+
421
+ def analyze_product_complete(self, image: Image.Image):
422
+ """
423
+ Complete end-to-end product analysis pipeline.
424
+
425
+ This method combines all analysis steps:
426
+ 1. Classify product category using CLIP
427
+ 2. Generate product description using BLIP
428
+ 3. Ask category-specific questions using BLIP VQA
429
+ 4. Compile results into structured e-commerce metadata
430
+
431
+ Args:
432
+ image (PIL.Image): Product image to analyze
433
+
434
+ Returns:
435
+ Dict: Complete analysis results with category, description, and attributes
436
+ """
437
+ if not self.models_loaded:
438
+ return {"error": "Models not loaded", "status": "failed"}
439
+
440
+ if image is None:
441
+ return {"error": "No image provided", "status": "failed"}
442
+
443
+ try:
444
+ print("πŸš€ Starting complete product analysis...")
445
+
446
+ # Step 1: Classify product category using CLIP zero-shot classification
447
+ product_categories = [
448
+ "clothing",
449
+ "shoes",
450
+ "electronics",
451
+ "furniture",
452
+ "books",
453
+ "toys",
454
+ ]
455
+ classification_results = self.classify_product_image(
456
+ image, product_categories
457
+ )
458
+
459
+ if classification_results[0]["label"] == "error":
460
+ return {"error": "Classification failed", "status": "failed"}
461
+
462
+ top_category = classification_results[0] # Highest confidence category
463
+
464
+ # Step 2: Generate product description using BLIP captioning
465
+ description = self.generate_product_caption(image)
466
+
467
+ # Step 3: Get category-specific questions and ask them using VQA
468
+ category = top_category["label"]
469
+ questions = self.get_category_questions(category)
470
+
471
+ # Ask each question and collect answers for product attributes
472
+ qa_results = {}
473
+ for question in questions:
474
+ answer = self.ask_about_product(image, question)
475
+ qa_results[question] = answer
476
+
477
+ # Step 4: Compile everything into structured e-commerce metadata
478
+ result = {
479
+ "category": {"name": category, "confidence": top_category["score"]},
480
+ "description": description,
481
+ "attributes": qa_results,
482
+ "all_categories": classification_results, # Include all classification results
483
+ "status": "success",
484
+ }
485
+
486
+ return result
487
+
488
+ except Exception as e:
489
+ return {"error": str(e), "status": "failed"}
490
+
491
+
492
+ # Initialize the main product cataloger instance
493
+ # This creates a single instance used throughout the app
494
+ product_cataloger = SmartProductCataloger()
495
+
496
+ # Define Gradio interface wrapper functions
497
+ # These functions adapt the class methods for use with Gradio components
498
+
499
+
500
+ def load_models_interface():
501
+ """
502
+ Gradio interface wrapper for loading AI models.
503
+
504
+ Returns:
505
+ str: Status message from model loading process
506
+ """
507
+ return product_cataloger.load_models()
508
+
509
+
510
+ def analyze_upload_interface(image):
511
+ """
512
+ Gradio interface wrapper for analyzing directly uploaded product images.
513
+
514
+ Args:
515
+ image (PIL.Image or None): Image uploaded through Gradio interface
516
+
517
+ Returns:
518
+ tuple: (image, analysis_text, category_text, attributes_text) for Gradio outputs
519
+ """
520
+ # Validate image input from Gradio component
521
+ if image is None:
522
+ error_msg = "❌ Please upload a product image."
523
+ return None, error_msg, error_msg, error_msg
524
+
525
+ # Run complete analysis pipeline on the uploaded image
526
+ result = product_cataloger.analyze_product_complete(image)
527
+
528
+ if result.get("status") == "failed":
529
+ error_msg = f"❌ Analysis failed: {result.get('error', 'Unknown error')}"
530
+ return image, error_msg, error_msg, error_msg
531
+
532
+ # Format results for display in Gradio interface
533
+ # Main analysis summary
534
+ analysis_text = f"""πŸ” PRODUCT ANALYSIS COMPLETE
535
+
536
+ πŸ“ Description: {result['description']}
537
+
538
+ 🏷️ Category: {result['category']['name']} (confidence: {result['category']['confidence']:.3f})
539
+
540
+ βœ… Analysis Status: {result['status']}"""
541
+
542
+ # Category classification results
543
+ category_text = "🏷️ CATEGORY CLASSIFICATION\n\n"
544
+ for cat in result["all_categories"]:
545
+ category_text += f"β€’ {cat['label']}: {cat['score']:.3f}\n"
546
+
547
+ # Product attributes from VQA
548
+ attributes_text = "πŸ” PRODUCT ATTRIBUTES\n\n"
549
+ for question, answer in result["attributes"].items():
550
+ attributes_text += f"❓ {question}\nπŸ’‘ {answer}\n\n"
551
+
552
+ # Return the same image for display along with analysis results
553
+ return image, analysis_text, category_text, attributes_text
554
+
555
+
556
+ def analyze_url_interface(url):
557
+ """
558
+ Gradio interface wrapper for analyzing product from URL.
559
+
560
+ Args:
561
+ url (str): Image URL from Gradio textbox
562
+
563
+ Returns:
564
+ tuple: (image, analysis_text, category_text, attributes_text) for Gradio outputs
565
+ """
566
+ # Validate URL input
567
+ if not url or not url.strip():
568
+ error_msg = "❌ Please provide an image URL."
569
+ return None, error_msg, error_msg, error_msg
570
+
571
+ # Load image from URL
572
+ image = product_cataloger.load_image_from_url(url.strip())
573
+ if image is None:
574
+ error_msg = "❌ Failed to load image from URL. Please check the URL."
575
+ return None, error_msg, error_msg, error_msg
576
+
577
+ # Run complete analysis pipeline on the loaded image
578
+ result = product_cataloger.analyze_product_complete(image)
579
+
580
+ if result.get("status") == "failed":
581
+ error_msg = f"❌ Analysis failed: {result.get('error', 'Unknown error')}"
582
+ return image, error_msg, error_msg, error_msg
583
+
584
+ # Format results for display in Gradio interface
585
+ # Main analysis summary
586
+ analysis_text = f"""πŸ” PRODUCT ANALYSIS COMPLETE
587
+
588
+ πŸ“ Description: {result['description']}
589
+
590
+ 🏷️ Category: {result['category']['name']} (confidence: {result['category']['confidence']:.3f})
591
+
592
+ βœ… Analysis Status: {result['status']}"""
593
+
594
+ # Category classification results
595
+ category_text = "🏷️ CATEGORY CLASSIFICATION\n\n"
596
+ for cat in result["all_categories"]:
597
+ category_text += f"β€’ {cat['label']}: {cat['score']:.3f}\n"
598
+
599
+ # Product attributes from VQA
600
+ attributes_text = "πŸ” PRODUCT ATTRIBUTES\n\n"
601
+ for question, answer in result["attributes"].items():
602
+ attributes_text += f"❓ {question}\nπŸ’‘ {answer}\n\n"
603
+
604
+ return image, analysis_text, category_text, attributes_text
605
+
606
+
607
+ # Create Gradio interface using Blocks for custom layout
608
+ # gr.Blocks: Allows custom layout with rows, columns, and advanced components
609
+ # title: Sets the browser tab title for the web interface
610
+ with gr.Blocks(title="Smart Product Cataloger") as app:
611
+ # gr.Markdown: Renders markdown text with formatting, emojis, and styling
612
+ # Supports HTML-like formatting for headers, lists, bold text, etc.
613
+ gr.Markdown(
614
+ """
615
+ # πŸ›οΈ Smart Product Cataloger
616
+
617
+ **Multimodal AI for E-commerce Product Analysis**
618
+
619
+ This app analyzes product images and generates metadata for e-commerce listings
620
+ using CLIP for classification and BLIP for captioning and visual question answering.
621
+
622
+ ## πŸš€ How to use:
623
+ 1. **Load Models** - Click to load the AI models (required first step)
624
+ 2. **Upload Image** - Upload a product image directly for analysis
625
+ 3. **URL Analysis** - Analyze products from image URLs
626
+ """
627
+ )
628
+
629
+ # Model loading section
630
+ # gr.Row: Creates horizontal layout container for organizing components side by side
631
+ with gr.Row():
632
+ # gr.Column: Creates vertical layout container within the row
633
+ with gr.Column():
634
+ # Markdown for section header with emoji and formatting
635
+ gr.Markdown("### πŸ“¦ Step 1: Load Models")
636
+
637
+ # gr.Button: Interactive button component
638
+ # variant="primary": Makes button blue/prominent (primary action)
639
+ # size="lg": Large button size for better visibility
640
+ load_btn = gr.Button("πŸ”„ Load Models", variant="primary", size="lg")
641
+
642
+ # gr.Textbox: Text input/output component
643
+ # label: Display label above the textbox
644
+ # interactive=False: Makes textbox read-only (output only)
645
+ load_status = gr.Textbox(label="Status", interactive=False)
646
+
647
+ # Event handler: Connects button click to function
648
+ # fn: Function to call when button is clicked
649
+ # outputs: Which component(s) receive the function's return value
650
+ load_btn.click(
651
+ fn=load_models_interface, # Function to execute
652
+ outputs=load_status, # Component to update with result
653
+ )
654
+
655
+ # Markdown horizontal rule for visual separation between sections
656
+ gr.Markdown("---")
657
+
658
+ # Direct image upload section
659
+ with gr.Row():
660
+ # Left column for image upload and controls
661
+ # scale=1: Equal width columns (both columns take same space)
662
+ with gr.Column(scale=1):
663
+ gr.Markdown("### πŸ“Έ Step 2: Upload Product Image")
664
+
665
+ # gr.Image for file upload functionality
666
+ # When no image is provided, shows upload interface
667
+ # label: Text shown above upload area
668
+ # height: Fixed pixel height for consistent layout
669
+ uploaded_image = gr.Image(label="Upload Product Image", height=400)
670
+
671
+ # Primary action button for direct image analysis
672
+ # variant="primary": Blue/prominent styling for main action
673
+ upload_analyze_btn = gr.Button(
674
+ "πŸ” Analyze Uploaded Image", variant="primary"
675
+ )
676
+
677
+ # Right column for displaying the uploaded image
678
+ with gr.Column(scale=1):
679
+ # gr.Image: Component for displaying the uploaded image
680
+ # label: Caption shown above image
681
+ # height: Consistent sizing with upload area
682
+ upload_image_display = gr.Image(label="Uploaded Image", height=400)
683
+
684
+ # Upload analysis results section with three columns for different result types
685
+ with gr.Row():
686
+ # Column for main analysis summary
687
+ with gr.Column(scale=1):
688
+ # Multi-line textbox for displaying main analysis results
689
+ # lines=8: Adequate height for analysis summary
690
+ # interactive=False: Read-only output field
691
+ upload_analysis_output = gr.Textbox(
692
+ label="πŸ“‹ Analysis Summary", lines=8, interactive=False
693
+ )
694
+
695
+ # Column for category classification results
696
+ with gr.Column(scale=1):
697
+ # Output textbox for category classification scores
698
+ upload_category_output = gr.Textbox(
699
+ label="🏷️ Category Classification", lines=8, interactive=False
700
+ )
701
+
702
+ # Column for product attributes from VQA
703
+ with gr.Column(scale=1):
704
+ # Output textbox for detailed product attributes
705
+ upload_attributes_output = gr.Textbox(
706
+ label="πŸ” Product Attributes", lines=8, interactive=False
707
+ )
708
+
709
+ # Event handler for upload analyze button
710
+ # inputs: Component whose value is passed to function
711
+ # outputs: Components that receive function return values (order matters)
712
+ upload_analyze_btn.click(
713
+ fn=analyze_upload_interface, # Function to call
714
+ inputs=uploaded_image, # Input component
715
+ outputs=[
716
+ upload_image_display,
717
+ upload_analysis_output,
718
+ upload_category_output,
719
+ upload_attributes_output,
720
+ ], # Output components
721
+ )
722
+
723
+ # Visual separator between sections
724
+ gr.Markdown("---")
725
+
726
+ # URL analysis section for analyzing products from web URLs
727
+ with gr.Row():
728
+ # Left column for URL input
729
+ with gr.Column(scale=1):
730
+ gr.Markdown("### 🌐 Step 3: Analyze from URL")
731
+
732
+ # gr.Textbox for URL input
733
+ # label: Text shown above input field
734
+ # placeholder: Hint text shown when field is empty
735
+ # lines=1: Single line input for URLs
736
+ url_input = gr.Textbox(
737
+ label="Product Image URL",
738
+ placeholder="https://example.com/product-image.jpg",
739
+ lines=1,
740
+ )
741
+
742
+ # Secondary action button for URL analysis
743
+ # variant="secondary": Gray/muted styling (less prominent than primary)
744
+ url_analyze_btn = gr.Button("πŸ”— Analyze from URL", variant="secondary")
745
+
746
+ # Right column for URL-loaded image display
747
+ with gr.Column(scale=1):
748
+ # Image component to show the loaded image from URL
749
+ url_image_display = gr.Image(label="Loaded Image", height=400)
750
+
751
+ # URL analysis results section with three columns for different result types
752
+ with gr.Row():
753
+ # Three columns for different types of analysis results
754
+ with gr.Column(scale=1):
755
+ # Main analysis results for URL-loaded image
756
+ url_analysis_output = gr.Textbox(
757
+ label="πŸ“‹ Analysis Summary", lines=8, interactive=False
758
+ )
759
+
760
+ with gr.Column(scale=1):
761
+ # Category classification for URL-loaded image
762
+ url_category_output = gr.Textbox(
763
+ label="🏷️ Category Classification", lines=8, interactive=False
764
+ )
765
+
766
+ with gr.Column(scale=1):
767
+ # Product attributes for URL-loaded image
768
+ url_attributes_output = gr.Textbox(
769
+ label="πŸ” Product Attributes", lines=8, interactive=False
770
+ )
771
+
772
+ # Event handler for URL analysis button
773
+ # inputs: URL textbox component
774
+ # outputs: All four components (image + three analysis results)
775
+ url_analyze_btn.click(
776
+ fn=analyze_url_interface, # Function to execute
777
+ inputs=url_input, # Input component
778
+ outputs=[
779
+ url_image_display,
780
+ url_analysis_output,
781
+ url_category_output,
782
+ url_attributes_output,
783
+ ], # Output components
784
+ )
785
+
786
+ # Final section with examples and usage tips
787
+ # Triple-quoted string allows multi-line markdown content
788
+ gr.Markdown(
789
+ """
790
+ ---
791
+ ### πŸ“ Example Product Categories:
792
+ - **Clothing**: shirts, dresses, pants, jackets
793
+ - **Shoes**: sneakers, boots, dress shoes, sandals
794
+ - **Electronics**: phones, laptops, headphones, tablets
795
+ - **Furniture**: chairs, tables, sofas, desks
796
+ - **Books**: novels, textbooks, magazines, comics
797
+ - **Toys**: games, dolls, educational toys, puzzles
798
+
799
+ ### πŸ”— Sample Product URLs:
800
+ - Shoes: https://images.unsplash.com/photo-1542291026-7eec264c27ff
801
+ - Electronics: https://images.unsplash.com/photo-1511707171634-5f897ff02aa9
802
+ - Clothing: https://images.unsplash.com/photo-1521572163474-6864f9cf17ab
803
+
804
+ """
805
+ )
806
+
807
+ if __name__ == "__main__":
808
+ """
809
+ Launch the Gradio app when script is run directly.
810
+
811
+ Configuration:
812
+ server_name="0.0.0.0": Allow access from any IP address
813
+ server_port=7860: Use port 7860 (Gradio default)
814
+ share=True: Create public Gradio link for sharing
815
+ debug=True: Enable debug mode for development
816
+ """
817
+ app.launch(
818
+ server_name="0.0.0.0", # Listen on all network interfaces
819
+ server_port=7860, # Standard Gradio port
820
+ share=True, # Generate shareable public link
821
+ debug=True, # Enable debug logging
822
+ )