| Resource Type | Dataset + Model | Dataset + Benchmark | Dataset + Methodology | Platform / Toolkit | Ontology + Dataset |
| Year / Venue | 2016 / CVPR | 2019 / CVPR | 2020 / CVPR | 2020 / arXiv | 2020 / CVPR |
| Primary Goal | Large-scale 2D clothing recognition & retrieval | Real-world, multi-task fashion understanding | Single-image 3D garment reconstruction | Unified execution framework for fashion vision tasks | Structuring fashion knowledge via explicit ontology |
| Problem Motivation | Limited scale & labels in earlier datasets | Sparse landmarks & single-garment bias in DF | Severe lack of 3D garment data | Fragmented fashion AI codebases | Inconsistent definition of fashion concepts |
| Data Dimensionality | 2D images | 2D multi-instance scenes | 3D meshes + images | 2D (uses external datasets) | 2D pixel-level annotations |
| Dataset Scale | 800K+ images | 491K images / 801K garments | 2,078 3D garment models | No proprietary dataset | 48K+ images |
| Garment Categories | 50+ | 13 | ~10 garment types | Dataset-dependent | 27 |
| Core Annotations | Category, attributes, sparse landmarks | BBox, mask, dense landmarks (39), pose, pairing | Mesh, UV, camera, pose, feature lines | Models, configs, pipelines | Instance, part, attribute, mask |
| Landmark Concept | Sparse 2D keypoints | Dense pose-aware landmarks | 3D structural feature lines | Task-dependent module | Encoded via part boundaries |
| Pose Information | ❌ | Clothing pose | Body–garment pose (SMPL) | Task-dependent | ❌ |
| Segmentation | ❌ | Instance mask | Implicit via mesh surface | Supported | Part-level masks |
| Attributes | Image-level | Image-level | Texture (not semantic) | Supported | Part-localized |
| Ontology Explicitness | ❌ | ❌ | ❌ | ❌ | ✅ Core contribution |
| Main Tasks | Classification, retrieval, attribute prediction | Detection, pose, segmentation, Re-ID | 3D reconstruction, registration, texture recovery | Attribute, retrieval, parsing, compatibility | Detection, segmentation, attribute localization |
| Proposed Model | FashionNet | Match R-CNN baseline | Hybrid mesh + implicit surface | Backbone–Head modular framework | Mask R-CNN baseline |
| Methodological Core | Landmark-aware pooling | Multi-task unified evaluation | Template adaptation + physics-aware features | Modular engineering design | Ontology-driven annotation |
| Evaluation Metrics | Top-k accuracy, recall | AP, OKS, PCK, ReID recall | Chamfer, EMD, Normal Consistency, 3D IoU | Task-standard metrics | mIoU, AP (det/attr) |
| Major Strength | First large-scale fashion dataset | Real-world complexity & multi-task scope | Provides true 3D structural evidence | Easy experimentation & extensibility | Explicit semantic structure of fashion |
| Main Limitation | Lacks structural & semantic depth | Limited semantic/ontological reasoning | Computationally heavy, frontal bias | Limited theoretical contribution | No aesthetic/emotional modeling |
| Emotion / Aesthetics Modeled | ❌ | ❌ | ❌ | ❌ | ❌ |