How to Import Your Dataset
How to Import Your Dataset
The DDS tool provides user-friendly Python APIs to import your datasets.
1. Start the DDS tool
To use the DDS Python APIs, you have to start the DDS tool first.
dds --quickstart
2. Import Local Image Directory
Say you have a directory containing images grouped by categories, like this:
$ tree -L 2
.
├── cat
│ ├── cat01.png
│ ├── cat02.png
│ ├── cat03.jpeg
│ └── cat04.webp
├── dog
│ ├── dog01.jpeg
│ ├── dog02.jpeg
│ ├── dog03.jpeg
│ └── dog04.jpeg
└── fish
├── fish01.webp
├── fish02.webp
├── fish03.jpg
└── fish04.jpeg
You can import this directory with the following Python code:
import os
from deepdataspace import DataSet
# First API call: create your dataset
dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")
# import every category into the dataset
dataset_dir = "/path/to/my/dataset" # replace this with your dataset directory
categories = os.listdir(dataset_dir)
for cat_idx, category in enumerate(categories):
cat_path = os.path.join(dataset_dir, category)
if not os.path.isdir(cat_path):
continue
print(f"Importing category {category}...")
items = os.listdir(cat_path)
for idx, item in enumerate(items):
image_path = os.path.join(cat_path, item)
image_uri = f"file://{image_path}"
# Second API call: import an image into the dataset
image = dataset.batch_add_image(image_uri, id_=cat_idx * 100 + idx)
# Third API call: add an annotation to the image
image.batch_add_annotation(category=category)
image.finish_batch_add_annotation()
# Final API call: finish the importing process
dataset.finish_batch_add_image()
This recipe here is creating a dataset, iterating the dataset directory and importing every image with a classification into the dataset.
All of these is done by 4 API calls:
DataSet.create_dataset
: create a datasetDataSet.batch_add_image
: import an image into the datasetImage.batch_add_annotation
: add an annotation to the imageImage.finish_batch_add_annotation()
flush all annotations added to the imageDataSet.finish_batch_add_image
: finish the importing process
Note
The id_
argument of DataSet.create_dataset
and DataSet.batch_add_image
is optional.
But it's necessary if you want to avoid inserting the same dataset or image in repeated script runs.
3. Import complicated dataset
The example above shows you how to import a local and simple directory dataset. DDS also supports more complicated dataset, such as:
- Your dataset images are not local files, but are remote urls.
- Your dataset contains not only classification, but also object detection, instance segmentation, object alpha matting, keypoints, etc.
- Your dataset contains not only ground-truth, but also prediction sets.
3.1 Import remote images
This is almost the same as the example above, except that you have to replace the image_uri
with a remote http(s)
url.
from deepdataspace import DataSet
dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")
categories = ["cat", "dog", "fish"]
for cat_idx, category in enumerate(categories):
print(f"Importing category {category}...")
for idx, item in range(10):
image_idx = cat_idx * 100 + idx
image_uri = f"https://mydataset.images.com/{category}/{image_idx}.png"
image = dataset.batch_add_image(image_uri, id_=image_idx)
image.batch_add_annotation(category=category)
image.finish_batch_add_annotation()
dataset.finish_batch_add_image()
3.2 Import complicated annotations
By providing extra arguments to Image.batch_add_annotation
, you can import more complicated annotations.
from deepdataspace import DataSet
dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")
categories = ["cat", "dog", "fish"]
for cat_idx, category in enumerate(categories):
print(f"Importing category {category}...")
for idx, item in range(10):
image_idx = cat_idx * 100 + idx
image_uri = f"https://mydataset.images.com/{category}/{image_idx}.png"
image = dataset.batch_add_image(image_uri,
id_=image_idx,
# width and height must be provided if you want to add object detection to image
width=500, height=500,
)
image.batch_add_annotation(
category=category,
# object detection bbox, (x, y, w, h)
bbox=(200, 200, 50, 50),
# segmentation lines, [[x1, y1, x2, y2, x3, y3, x4, y4], ...]
segmentation=[[200, 200, 250, 200, 250, 250, 200, 250]],
# matting mask alpha image uri
alpha_uri="https://mydataset.images.com/{category}/{image_idx}.png",
# coco2017 person keypoint format, [x1, y1, v1, conf1, x2, y2, v2, conf2, ...]
coco_keypoints=[10, 10, 1, 1, 20, 20, 1, 1],
)
image.finish_batch_add_annotation()
dataset.finish_batch_add_image()
3.3 Import prediction sets
By default, the Image.batch_add_annotation
will add the annotation to the ground-truth set. You can also specify a different label set name and type.
from deepdataspace import DataSet
from deepdataspace.constants import LabelType
dataset = DataSet.create_dataset("my_dataset", id_="my_dataset_unique_id")
categories = ["cat", "dog", "fish"]
for cat_idx, category in enumerate(categories):
print(f"Importing category {category}...")
for idx, item in range(10):
image_idx = cat_idx * 100 + idx
image_uri = f"https://mydataset.images.com/{category}/{image_idx}.png"
image = dataset.batch_add_image(image_uri,
id_=image_idx,
# width and height must be provided if you want to add object detection to image
width=500, height=500,
)
# add the ground-truth annotation to the image
image.batch_add_annotation(
category=category,
bbox=(200, 200, 50, 50), # object detection bbox, (x, y, w, h)
)
# add a prediction to the image
image.batch_add_annotation(
category=category,
label="my_prediction1", # the label set name
label_type=LabelType.Prediction, # the label set is a prediction set
conf=0.85, # the confidence of the prediction
bbox=(201, 201, 50, 51),
)
# add second prediction to the image
image.batch_add_annotation(
category=category,
label="my_prediction2", # the label set name
label_type=LabelType.Prediction, # the label set is a prediction set
conf=0.9, # the confidence of the prediction
bbox=(201, 201, 55, 55),
)
image.finish_batch_add_annotation()
# finish the importing process
dataset.finish_batch_add_image()
4. Explore the Imported Dataset
Once you have imported the dataset, go to the DDS index page, select the imported dataset, and begin your exploration:
- Filter images by category.
- Switch on/off display options:
- Show/Hide annotation types
- Show/Hide annotations
- Show/Hide image descriptions
- Adjust specific display configurations for each annotation type
- Zoom in to view a single image in detail with its metadata.
- Compare multiple annotation label sets, with the option to overlap or tile the comparisons.
- Enter analysis mode to perform FN / FP metric analysis on prediction result sets.
Demonstrations are shown below:
What's Next
- How to Import COCO2017 Dataset.
- How to Export Dataset.
- API Reference for more detailed importing options.