DeepFashion2 (DF2) — Normalized Dataset for FEL

1. Dataset Overview

DeepFashion2 (DF2) is the successor to DF1. It is a multitask clothing dataset and benchmark designed to evaluate:

within a single unified platform.

The df2felout.zip artifact converts DF2’s original annotations (JSON + retrieval JSON) into a normalized schema directly compatible with the FEL graph and learning pipeline.

Key Characteristics


2. Folder and File Description

2.1 Original Input Tree

Input paths confirmed via artifact provenance (src_file fields):

Minimal required structure:

(df2_root)
├─ train/
│  └─ annos/
│     ├─ 000001.json
│     ├─ 000002.json
│     └─ ...
└─ json_for_validation/
   ├─ val_query.json
   └─ val_gallery.json

Expected (assumed) structure by the code:

Note: Some test retrieval files may be detected during QC but are not parsed into normalized tables.


2.2 Normalized Output Files

According to normalized/manifest.csv, there are 11 core artifacts:

Audit / validation:


3. Role in FEL

DF2 provides strong signals for:

➡️ Same-item consistency ➡️ Cross-domain linkage (user ↔ shop)

based on the pair-centric design.

3.1 Node Mapping


3.2 Edge Mapping

pair_uid → has_image → image_uid
image_uid → has_item → item_uid
pair_uid → has_item → item_uid

item_uid → has_bbox → bbox
item_uid → has_segmentation → segmentation
item_uid → has_keypoints → keypoints

(pair_id, style) → retrieval_positive → (query_image_uid, gallery_image_uid)

3.3 Unique Contribution of DF2

While DF1 strengthens category/attribute/text signals, DF2 emphasizes:

The pair_uid acts as the central hub for identity relationships.


4. Extracted Data

4.1 Core Entities

images.csv.gz — image-level entities

items.csv.gz — clothing instances inside images

pair_items.csv.gz — pair ↔ item bridge (critical for graph linkage)

pairs.csv.gz — pair-level aggregation (image lists preserved as JSON array strings)


4.2 Modality Evidence Tables


4.3 Retrieval Layer

Important implementation detail:

Some DF2 retrieval files use non-padded numeric IDs.

Therefore:

Joins are stabilized using image_id_int.


5. Graph Structure Description

DF2 uses a pair-centric architecture, unlike DF1’s image-centric structure.

5.1 Central Hub: Pairs

pairs.csv.gz acts as the hub node.

All images, items, and retrieval links connect through pair_id.


5.2 Structural Flow

Pairs → Images → Items → (BBox / Seg / Keypoints / QualityFlags)

Separates:

This enables both identity evaluation and fine-grained learning within a single schema.


5.3 Retrieval Layer


5.4 QC and Metadata Nodes

These represent extraction integrity rather than semantic data nodes.

They are typically shown as dashed conceptual links in graphs.


DF2 Normalized Graph (Interactive)

Click below to open the interactive graph in a new window:

Open DF2 Graph Interactive Editor

Final Summary

The normalized DF2 dataset: