Anton Kozackov - Personal Portfolio

Overview & Goal

A fast-growing European e-commerce company needed to publish 100k+ SKUs across multiple marketplaces, each with unique listing templates, attribute requirements, and tone guidelines. Their product data existed in proprietary databases with inconsistent coverage—some SKUs lacked descriptions, others had missing attributes.

The goal: build a Python/Django system that could automatically ingest CSV data, normalize it to a canonical schema, and generate platform-compliant listings in multiple languages with image-aware content enrichment.

Results

The system achieved remarkable efficiency gains:

96% auto-mapping accuracy of unfamiliar CSV headers to canonical fields (improving to 98% with learned corrections)
72% reduction in manual localization effort and 38% faster time-to-listing
11–17% lift in product page conversion on A/B test markets
Less than 2% rejection rate by marketplaces (down from 12% baseline)

Challenges

Building a multilingual, multi-marketplace content pipeline involved complex technical and business challenges:

Unknown CSV Headers: Vendors used idiosyncratic headers like "Comp Width", "W (mm)", "shoe_upper" that needed intelligent mapping
Platform Heterogeneity: Each marketplace enforced unique field sets, lengths, enums, and SEO tone requirements
LLM Hallucinations: Preventing factual errors and maintaining grounding in source data
Multilingual Quality: Maintaining brand tone and technical accuracy across 10+ languages
Cost & Latency Control: Managing LLM API costs and processing times for large catalogs

Solution

We engineered a comprehensive Python/Django system with OpenAI integration that transforms messy product data into high-quality, localized marketplace listings through intelligent automation and validation.

Key System Components

1. Intelligent Data Ingestion

CSV upload with dual-engine header mapping: embeddings-based similarity matching and symbolic rules for semantic understanding. Active learning loop where human corrections improve accuracy over time. Pydantic validation ensures data quality from ingestion.

2. Multi-Modal Content Generation

OpenAI vision models extract product attributes from images while text models generate localized content. Structured JSON outputs via function calling ensure reliable data contracts. Image-aware descriptions provide richer, more compelling copy.

3. Platform Adaptation Engine

Canonical product schema transforms to platform-specific formats (Amazon, Shopify, Zalando, bol.com). Constraint-aware generation respects field lengths, mandatory attributes, and marketplace policies. Automated compliance linting prevents rejections.

4. Quality & Governance Framework

Pydantic models enforce strict validation contracts. Human-in-the-loop review for low-confidence cases. Audit trails track every generation with prompt versioning and content hashing for full observability.

Technical Architecture

Core Stack: Python, Django + DRF, Pydantic for contracts, PostgreSQL, Redis
Orchestration: Celery with backpressure control and adaptive model routing
AI Integration: OpenAI text + vision APIs with structured outputs and cost optimization
Localization: Per-language glossaries, term locking, and style guides per marketplace
Monitoring: Real-time dashboards for cost, latency, and quality metrics per SKU and market

Key Innovations

Grounded Generation: LLMs instructed to use canonical data as single source of truth, preventing hallucinations
Fusion Prompting: Combining structured attributes with visual evidence for richer, more accurate descriptions
Adaptive Cost Control: Cheaper models for simple transformations, premium models for creative content
Active Learning: System improves accuracy through feedback loops and learned corrections

If you're looking for scalable SaaS design, deep integration with complex APIs, or predictive tooling for real-world operations—this project is a proven case study of robust, end-to-end execution.

Project Info

Role: Software Engineer
Type: Enterprise LLM System
Date: 2024
Scale: 100k+ SKUs

Tech Stack

Backend: Python, Django + DRF
AI/ML: OpenAI API, Vision Models
Validation: Pydantic, Custom Validators
Database: PostgreSQL, Redis
Orchestration: Celery, Task Queues
Monitoring: Grafana, Sentry

Multimodal Content Enrichtmet Pipeline for Product Data