How to Export Your Dataset
About 2 min
How to Export Your Dataset
Exporting dataset in DDS is as easy as importing.
1. Start the DDS tool
To use the DDS Python APIs, you have to start the DDS tool first.
dds --quickstart
2. Export Dataset the Simplest Way
To export a dataset, you will use the same dataset_id
as you used to import the dataset.
If you didn't set the dataset_id
or just forget it, this piece of code might be helpful:
from deepdataspace.model import DataSet
from deepdataspace.model import Image
datasets = DataSet.find_many({"name": "your_dataset_name"})
# or if you forget the dataset name, you can find all datasets
# datasets = DataSet.find_many({"name": "your_dataset_name"})
for dataset in datasets:
ImageModel = Image(dataset.id)
num_images = ImageModel.count_num({})
print(f"dataset_name={dataset.name}, dataset_id={dataset.id}, {num_images} images")
The outputs will remind you of the dataset info you want to export.
Then you can export the dataset with the following Python code:
import json
from deepdataspace.model import DataSet
from deepdataspace.model import Image
dataset_id = "your_dataset_id"
dataset = DataSet.find_one({"id": dataset_id})
# create the image model dynamically
ImageModel = Image(dataset_id)
num_images = ImageModel.count_num({})
print(f"Exporting dataset of {num_images} images: {dataset.name}")
output_images = []
# find all images in the dataset sorted by idx and id, and exporting them
images = ImageModel.find_many({}, sort=[("idx", 1), ("id", 1)])
for idx, image in enumerate(images):
# export all fields of the image
output = image.to_dict()
output_images.append(output)
print(f"{idx + 1}/{num_images}: exported image {image.id} of {len(image.objects)} objects")
output = {
"dataset_id": dataset_id,
"dataset_name": dataset.name,
"images": output_images
}
with open(f"{dataset.name}_{dataset_id}.json", "w", encoding="utf8") as fp:
json.dump(output, fp, ensure_ascii=False)
3. Export Dataset with More Control
The above code exports all fields of the image, which contains noisy information of DDS internal usages. If you want to export only the fields you need, you can do it like this:
import json
from deepdataspace.model import DataSet
from deepdataspace.model import Image
# the id of the dataset you want to export
dataset_id = "your_dataset_id"
dataset = DataSet.find_one({"id": "your_dataset_id"})
if dataset is None:
print(f"Cannot find dataset with id: {dataset_id}")
exit(0)
# create the image model dynamically
ImageModel = Image(dataset_id)
num_images = ImageModel.count_num({})
print(f"Exporting dataset of {num_images} images: {dataset.name}")
output_images = []
# find all images in the dataset sorted by idx and id, and exporting them
images = ImageModel.find_many({}, sort=[("idx", 1), ("id", 1)])
for idx, image in enumerate(images):
# export the fields only of interest
output = {
"id": image.id,
"idx": image.idx,
"url": image.url_full_res,
"thumbnail": image.url,
"width": image.width,
"height": image.height,
"metadata": json.loads(image.metadata),
"objects": []
}
for anno_obj in image.objects:
obj_data = anno_obj.to_dict(exclude=["compare_result", "matched_det_idx", "confirm_type"]) # exclude some fields
output["objects"].append(obj_data)
output_images.append(output)
print(f"{idx + 1}/{num_images}: exported image {image.id} of {len(output['objects'])} objects")
output = {
"dataset_id": dataset_id,
"dataset_name": dataset.name,
"images": output_images
}
with open(f"{dataset.name}_{dataset_id}.json", "w", encoding="utf8") as fp:
json.dump(output, fp, ensure_ascii=False)
What's Next
- Model Fields Reference for more detailed explanation of model fields.