如何导入数据集

DDS Team大约 3 分钟

如何导入数据集

How to Import Your Dataset

The DDS tool provides user-friendly Python APIs to import your datasets.

1. Start the DDS tool

To use the DDS Python APIs, you have to start the DDS tool first.

dds --quickstart

2. Import Local Image Directory

Say you have a directory containing images grouped by categories, like this:

$ tree -L 2
.
├── cat
│   ├── cat01.png
│   ├── cat02.png
│   ├── cat03.jpeg
│   └── cat04.webp
├── dog
│   ├── dog01.jpeg
│   ├── dog02.jpeg
│   ├── dog03.jpeg
│   └── dog04.jpeg
└── fish
    ├── fish01.webp
    ├── fish02.webp
    ├── fish03.jpg
    └── fish04.jpeg

You can import this directory with the following Python code:

import os

from deepdataspace import DataSet

# First API call: create your dataset
dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")

# import every category into the dataset
dataset_dir = "/path/to/my/dataset"  # replace this with your dataset directory
categories = os.listdir(dataset_dir)
for cat_idx, category in enumerate(categories):
  cat_path = os.path.join(dataset_dir, category)
  if not os.path.isdir(cat_path):
    continue

  print(f"Importing category {category}...")
  items = os.listdir(cat_path)
  for idx, item in enumerate(items):
    image_path = os.path.join(cat_path, item)
    image_uri = f"file://{image_path}"
    # Second API call: import an image into the dataset
    image = dataset.batch_add_image(image_uri, id_=cat_idx * 100 + idx)
    # Third API call: add an annotation to the image
    image.batch_add_annotation(category=category)
    image.finish_batch_add_annotation()

# Final API call: finish the importing process
dataset.finish_batch_add_image()

This recipe here is creating a dataset, iterating the dataset directory and importing every image with a classification into the dataset.
All of these is done by 4 API calls:

  • DataSet.create_dataset: create a dataset
  • DataSet.batch_add_image: import an image into the dataset
  • Image.batch_add_annotation: add an annotation to the image
  • Image.finish_batch_add_annotation() flush all annotations added to the image
  • DataSet.finish_batch_add_image: finish the importing process

注意

The id_ argument of DataSet.create_dataset and DataSet.batch_add_image is optional.
But it's necessary if you want to avoid inserting the same dataset or image in repeated script runs.

3. Import complicated dataset

The example above shows you how to import a local and simple directory dataset. DDS also supports more complicated dataset, such as:

  • Your dataset images are not local files, but are remote urls.
  • Your dataset contains not only classification, but also object detection, instance segmentation, object alpha matting, keypoints, etc.
  • Your dataset contains not only ground-truth, but also prediction sets.

3.1 Import remote images

This is almost the same as the example above, except that you have to replace the image_uri with a remote http(s) url.

from deepdataspace import DataSet

dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")

categories = ["cat", "dog", "fish"]
for cat_idx, category in enumerate(categories):
    print(f"Importing category {category}...")
    for idx, item in range(10):
        image_idx = cat_idx * 100 + idx
        image_uri = f"https://mydataset.images.com/{category}/{image_idx}.png"
        image = dataset.batch_add_image(image_uri, id_=image_idx)
        image.batch_add_annotation(category=category)
        image.finish_batch_add_annotation()

dataset.finish_batch_add_image()

3.2 Import complicated annotations

By providing extra arguments to Image.batch_add_annotation, you can import more complicated annotations.

from deepdataspace import DataSet

dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")

categories = ["cat", "dog", "fish"]
for cat_idx, category in enumerate(categories):
    print(f"Importing category {category}...")
    for idx, item in range(10):
        image_idx = cat_idx * 100 + idx
        image_uri = f"https://mydataset.images.com/{category}/{image_idx}.png"
        image = dataset.batch_add_image(image_uri,
                                        id_=image_idx,
                                        # width and height must be provided if you want to add object detection to image
                                        width=500, height=500,
                                        )
        image.batch_add_annotation(
                category=category,
                # object detection bbox, (x, y, w, h)
                bbox=(200, 200, 50, 50),
                # segmentation lines, [[x1, y1, x2, y2, x3, y3, x4, y4], ...]
                segmentation=[[200, 200, 250, 200, 250, 250, 200, 250]],
                # matting mask alpha image uri
                alpha_uri="https://mydataset.images.com/{category}/{image_idx}.png",
                # coco2017 person keypoint format, [x1, y1, v1, conf1, x2, y2, v2, conf2, ...]
                coco_keypoints=[10, 10, 1, 1, 20, 20, 1, 1],
        )
        image.finish_batch_add_annotation()
        
dataset.finish_batch_add_image()

3.3 Import prediction sets

By default, the Image.batch_add_annotation will add the annotation to the ground-truth set. You can also specify a different label set name and type.

from deepdataspace import DataSet
from deepdataspace.constants import LabelType

dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")

categories = ["cat", "dog", "fish"]
for cat_idx, category in enumerate(categories):
    print(f"Importing category {category}...")
    for idx, item in range(10):
        image_idx = cat_idx * 100 + idx
        image_uri = f"https://mydataset.images.com/{category}/{image_idx}.png"
        image = dataset.batch_add_image(image_uri,
                                        id_=image_idx,
                                        # width and height must be provided if you want to add object detection to image
                                        width=500, height=500,
                                        )

        # add the ground-truth annotation to the image
        image.batch_add_annotation(
                category=category,
                bbox=(200, 200, 50, 50),  # object detection bbox, (x, y, w, h)
        )

        # add a prediction to the image
        image.batch_add_annotation(
                category=category,
                label="my_prediction1",  # the label set name
                label_type=LabelType.Prediction,  # the label set is a prediction set
                conf=0.85,  # the confidence of the prediction
                bbox=(201, 201, 50, 51),
        )

        # add second prediction to the image
        image.batch_add_annotation(
                category=category,
                label="my_prediction2",  # the label set name
                label_type=LabelType.Prediction,  # the label set is a prediction set
                conf=0.9,  # the confidence of the prediction
                bbox=(201, 201, 55, 55),
        )
        
        image.finish_batch_add_annotation()
        
# finish the importing process
dataset.finish_batch_add_image()

4. Explore the Imported Dataset

Once you have imported the dataset, go to the DDS index page, select the imported dataset, and begin your exploration:

  1. Filter images by category.
  2. Switch on/off display options:
  • Show/Hide annotation types
  • Show/Hide annotations
  • Show/Hide image descriptions
  • Adjust specific display configurations for each annotation type
  1. Zoom in to view a single image in detail with its metadata.
  2. Compare multiple annotation label sets, with the option to overlap or tile the comparisons.
  3. Enter analysis mode to perform FN / FP metric analysis on prediction result sets.

Demonstrations are shown below:

What's Next

上次编辑于:
贡献者: huweiqiang,zhuyuanhao