如何导出你的数据集

DDS Team大约 2 分钟

如何导出你的数据集

Exporting dataset in DDS is as easy as importing.

1. Start the DDS tool

To use the DDS Python APIs, you have to start the DDS tool first.

dds --quickstart

2. Export Dataset the Simplest Way

To export a dataset, you will use the same dataset_id as you used to import the dataset.
If you didn't set the dataset_id or just forget it, this piece of code might be helpful:

from deepdataspace.model import DataSet
from deepdataspace.model import Image

datasets = DataSet.find_many({"name": "your_dataset_name"})
# or if you forget the dataset name, you can find all datasets
# datasets = DataSet.find_many({"name": "your_dataset_name"})
for dataset in datasets:
    ImageModel = Image(dataset.id)
    num_images = ImageModel.count_num({})
    print(f"dataset_name={dataset.name}, dataset_id={dataset.id}, {num_images} images")

The outputs will remind you of the dataset info you want to export.
Then you can export the dataset with the following Python code:

import json

from deepdataspace.model import DataSet
from deepdataspace.model import Image

dataset_id = "your_dataset_id"
dataset = DataSet.find_one({"id": dataset_id})

# create the image model dynamically
ImageModel = Image(dataset_id)
num_images = ImageModel.count_num({})

print(f"Exporting dataset of {num_images} images: {dataset.name}")
output_images = []

# find all images in the dataset sorted by idx and id, and exporting them
images = ImageModel.find_many({}, sort=[("idx", 1), ("id", 1)])
for idx, image in enumerate(images):
    # export all fields of the image
    output = image.to_dict()
    output_images.append(output)
    print(f"{idx + 1}/{num_images}: exported image {image.id} of {len(image.objects)} objects")

output = {
    "dataset_id": dataset_id,
    "dataset_name": dataset.name,
    "images": output_images
}

with open(f"{dataset.name}_{dataset_id}.json", "w", encoding="utf8") as fp:
    json.dump(output, fp, ensure_ascii=False)

3. Export Dataset with More Control

The above code exports all fields of the image, which contains noisy information of DDS internal usages. If you want to export only the fields you need, you can do it like this:

import json

from deepdataspace.model import DataSet
from deepdataspace.model import Image

# the id of the dataset you want to export
dataset_id = "your_dataset_id"
dataset = DataSet.find_one({"id": "your_dataset_id"})
if dataset is None:
    print(f"Cannot find dataset with id: {dataset_id}")
    exit(0)

# create the image model dynamically
ImageModel = Image(dataset_id)
num_images = ImageModel.count_num({})

print(f"Exporting dataset of {num_images} images: {dataset.name}")
output_images = []

# find all images in the dataset sorted by idx and id, and exporting them
images = ImageModel.find_many({}, sort=[("idx", 1), ("id", 1)])
for idx, image in enumerate(images):
    # export the fields only of interest
    output = {
        "id": image.id,
        "idx": image.idx,
        "url": image.url_full_res,
        "thumbnail": image.url,
        "width": image.width,
        "height": image.height,
        "metadata": json.loads(image.metadata),
        "objects": []
    }
    for anno_obj in image.objects:
        obj_data = anno_obj.to_dict(exclude=["compare_result", "matched_det_idx", "confirm_type"]) # exclude some fields
        output["objects"].append(obj_data)
    output_images.append(output)
    print(f"{idx + 1}/{num_images}: exported image {image.id} of {len(output['objects'])} objects")

output = {
    "dataset_id": dataset_id,
    "dataset_name": dataset.name,
    "images": output_images
}

with open(f"{dataset.name}_{dataset_id}.json", "w", encoding="utf8") as fp:
    json.dump(output, fp, ensure_ascii=False)

What's Next

上次编辑于:
贡献者: huweiqiang