Fashion Vision Resources

From Recognition to Geometry

The major fashion resources: DeepFashion, DeepFashion2, DeepFashion3D, MMFashion, and Fashionpedia. This synthesis integrates dataset scope, benchmark role, annotation philosophy, and how each resource relates to the Fashion Emotion Lexicon (FEL).

Appearance Structure Semantics Geometry Engineering

Resource Profiles

DeepFashion

Dataset + model · 2016 / CVPR

Focus: large-scale 2D clothing recognition and retrieval
Scale: 800K+ images
Strength: rich categories, 1K attributes, landmarks, paired retrieval images
Best for: attribute learning, landmark-aware recognition, retrieval

DeepFashion2

Dataset + benchmark · 2019 / CVPR

Focus: detection, pose, segmentation, re-identification
Scale: 491K images / 801K garments
Strength: dense landmarks, masks, deformation labels, same-clothing pairs
Best for: multi-task clothing understanding in the wild

DeepFashion3D

Dataset + methodology · 2020 / CVPR

Focus: single-image 3D garment reconstruction
Scale: 200K+ images / 2,078 3D models
Strength: mesh, UV, texture, camera/pose, geometry-grounded evaluation
Best for: 3D reconstruction, registration, virtual try-on simulation

MMFashion

Platform / toolkit · 2020 / arXiv

Focus: unified execution framework for fashion vision tasks
Scale: no proprietary dataset
Strength: modular engineering, reusable heads, fast experiment pipeline
Best for: implementation, benchmarking, reproducible experimentation

Fashionpedia

Ontology + dataset · 2020 / CVPR

Focus: ontology, segmentation, attribute localization
Scale: 48K+ images / 200K+ objects
Strength: explicit category–part–attribute structure
Best for: part-aware parsing, ontology-driven reasoning, localization

Paper Detail Table

Resource Core Paper Contribution What It Adds Beyond Earlier Resources Most Defining Evidence
DeepFashion Introduces a very large 2D fashion dataset and FashionNet, which jointly learns attributes and landmarks. Moves the field from small, weakly annotated collections to rich, large-scale supervised clothing understanding. 800K+ images, 1K attributes, 300K+ consumer–shop pairs, landmark-aware recognition and retrieval gains.
DeepFashion2 Builds a clothing-centered multi-task benchmark for detection, pose, segmentation, and re-ID under realistic variation. Adds dense landmarks, masks, deformation labels, and integrated evaluation under occlusion, zoom, and overlap. 491K images, 801K+ items, 39 landmarks, 873K same-clothing pairs, stronger real-scene benchmark design.
DeepFashion3D Shifts from 2D appearance to true 3D garment geometry with reconstruction, registration, and texture recovery tasks. Introduces mesh-level supervision, UV space, and physically meaningful garment geometry. 200K+ images, 2,078 3D garments, mesh/UV/texture annotations, Chamfer and 3D IoU evaluation.
MMFashion Provides a unified PyTorch toolbox covering major fashion vision tasks through modular engineering design. Does not mainly add a new dataset; instead it operationalizes task development and experimentation at scale. Shared backbone–head framework, config-driven pipeline, support for attribute, retrieval, parsing, and compatibility tasks.
Fashionpedia Introduces the first large-scale benchmark integrating categories, parts, attributes, and ontology structure. Adds explicit semantic structure and part-level attribute localization beyond image-level labels. 27 categories, 19 parts, 46 attributes, pixel-level masks, ontology-aware evaluation and localization tasks.

How These Resources Relate to FEL

Integrated View

In FEL, these resources are not treated as redundant competitors. They act as complementary evidence layers contributing different kinds of knowledge about fashion items.

DeepFashion anchors the system in 2D appearance evidence: category, attribute, landmark, and paired retrieval supervision. DeepFashion2 strengthens this with structure-aware real-world evidence, especially detection, dense landmarks, segmentation, and re-identification under difficult conditions.

Fashionpedia provides the semantic grounding layer by explicitly organizing categories, parts, and attributes into a more ontology-like structure. DeepFashion3D extends the system into physical garment geometry, supplying pose-aware 3D evidence rather than only 2D appearance. MMFashion serves as the engineering layer, helping operationalize experiments and benchmark implementations across tasks.

FEL Role Summary

DeepFashion: “What is visible?”

DeepFashion2: “How is it arranged under real-world conditions?”

Fashionpedia: “Which part has which meaning or property?”

DeepFashion3D: “What is the garment’s physical form?”

MMFashion: “How do we implement, benchmark, and iterate efficiently?”

From an FEL perspective, the progression is: appearance → structure → semantics → geometry, while the toolkit layer enables reproducible experimentation around all of them.