Fashion Resources Overview

From Recognition to Geometry

The major fashion resources: DeepFashion, DeepFashion2, DeepFashion3D, MMFashion, and Fashionpedia. This synthesis integrates dataset scope, benchmark role, annotation philosophy, and how each resource relates to the Fashion Emotion Lexicon (FEL).

Resource Profiles

DeepFashion

Dataset + model · 2016 / CVPR

Focus: large-scale 2D clothing recognition and retrieval

Scale: 800K+ images

Strength: rich categories, 1K attributes, landmarks, paired retrieval images

Best for: attribute learning, landmark-aware recognition, retrieval

Site Paper

DeepFashion2

Dataset + benchmark · 2019 / CVPR

Focus: detection, pose, segmentation, re-identification

Scale: 491K images / 801K garments

Strength: dense landmarks, masks, deformation labels, same-clothing pairs

Best for: multi-task clothing understanding in the wild

Site Paper

DeepFashion3D

Dataset + methodology · 2020 / CVPR

Focus: single-image 3D garment reconstruction

Scale: 200K+ images / 2,078 3D models

Strength: mesh, UV, texture, camera/pose, geometry-grounded evaluation

Best for: 3D reconstruction, registration, virtual try-on simulation

Site Paper

MMFashion

Platform / toolkit · 2020 / arXiv

Focus: unified execution framework for fashion vision tasks

Scale: no proprietary dataset

Strength: modular engineering, reusable heads, fast experiment pipeline

Best for: implementation, benchmarking, reproducible experimentation

Site Paper

Fashionpedia

Ontology + dataset · 2020 / CVPR

Focus: ontology, segmentation, attribute localization

Scale: 48K+ images / 200K+ objects

Strength: explicit category–part–attribute structure

Best for: part-aware parsing, ontology-driven reasoning, localization

Site Paper

Paper Detail Table

Resource	Core Paper Contribution	What It Adds Beyond Earlier Resources	Most Defining Evidence
DeepFashion	Introduces a very large 2D fashion dataset and FashionNet, which jointly learns attributes and landmarks.	Moves the field from small, weakly annotated collections to rich, large-scale supervised clothing understanding.	800K+ images, 1K attributes, 300K+ consumer–shop pairs, landmark-aware recognition and retrieval gains.
DeepFashion2	Builds a clothing-centered multi-task benchmark for detection, pose, segmentation, and re-ID under realistic variation.	Adds dense landmarks, masks, deformation labels, and integrated evaluation under occlusion, zoom, and overlap.	491K images, 801K+ items, 39 landmarks, 873K same-clothing pairs, stronger real-scene benchmark design.
DeepFashion3D	Shifts from 2D appearance to true 3D garment geometry with reconstruction, registration, and texture recovery tasks.	Introduces mesh-level supervision, UV space, and physically meaningful garment geometry.	200K+ images, 2,078 3D garments, mesh/UV/texture annotations, Chamfer and 3D IoU evaluation.
MMFashion	Provides a unified PyTorch toolbox covering major fashion vision tasks through modular engineering design.	Does not mainly add a new dataset; instead it operationalizes task development and experimentation at scale.	Shared backbone–head framework, config-driven pipeline, support for attribute, retrieval, parsing, and compatibility tasks.
Fashionpedia	Introduces the first large-scale benchmark integrating categories, parts, attributes, and ontology structure.	Adds explicit semantic structure and part-level attribute localization beyond image-level labels.	27 categories, 19 parts, 46 attributes, pixel-level masks, ontology-aware evaluation and localization tasks.

How These Resources Relate to FEL

Integrated View

In FEL, these resources are not treated as redundant competitors. They act as complementary evidence layers contributing different kinds of knowledge about fashion items.

DeepFashion anchors the system in 2D appearance evidence: category, attribute, landmark, and paired retrieval supervision. DeepFashion2 strengthens this with structure-aware real-world evidence, especially detection, dense landmarks, segmentation, and re-identification under difficult conditions.

Fashionpedia provides the semantic grounding layer by explicitly organizing categories, parts, and attributes into a more ontology-like structure. DeepFashion3D extends the system into physical garment geometry, supplying pose-aware 3D evidence rather than only 2D appearance. MMFashion serves as the engineering layer, helping operationalize experiments and benchmark implementations across tasks.

FEL Role Summary

DeepFashion: “What is visible?”

DeepFashion2: “How is it arranged under real-world conditions?”

Fashionpedia: “Which part has which meaning or property?”

DeepFashion3D: “What is the garment’s physical form?”

MMFashion: “How do we implement, benchmark, and iterate efficiently?”

From an FEL perspective, the progression is: appearance → structure → semantics → geometry, while the toolkit layer enables reproducible experimentation around all of them.