There is a two-step confirmation before making any changes to disk. First, crawl through the repo
and collect all images and collect all labels. Labels must be in YOLOv5 format.
datadump = "ipynb_tests/01_split_datadump"
g = Generation(repo="./Image Repo/labeled/Final Roboflow Export (841)",
out_dir=datadump,
verbose=True)
Apply a custom split or use only a maximum number of annotated images:
custom_split = {"train": .5, "valid": .3, "test": .2}
g.set_split(split_ratio=custom_split, MAX_SIZE=20)
g.get_split()
Or split the entire repository on the default 70%/20%/10% of train/valid/test:
g.set_split()
g.get_split()
Send the desired split to a zip folder that is sent to the out_dir
zipped = g.write_split_to_disk(descriptor="<01_split_all>")
print("Complete: ", zipped)