-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Export brush label masks with matching file base name #254
base: master
Are you sure you want to change the base?
Changes from all commits
576f48b
53ac094
48693c4
792f4d9
d520496
bd7bbba
2a94d96
e841aa3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,11 @@ | |
from collections import defaultdict | ||
from itertools import groupby | ||
|
||
from label_studio_sdk.converter.utils import ( | ||
download, | ||
ensure_dir, | ||
) | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
|
@@ -99,7 +104,7 @@ def decode_rle(rle, print_params: bool = False): | |
return out | ||
|
||
|
||
def decode_from_annotation(from_name, results): | ||
def decode_from_annotation(results): | ||
"""from LS annotation to {"tag_name + label_name": [numpy uint8 image (width x height)]}""" | ||
layers = {} | ||
counters = defaultdict(int) | ||
|
@@ -116,7 +121,7 @@ def decode_from_annotation(from_name, results): | |
width = result["original_width"] | ||
height = result["original_height"] | ||
labels = result[key] if key in result else ["no_label"] | ||
name = from_name + "-" + "-".join(labels) | ||
name = "".join(labels) | ||
|
||
# result count | ||
i = str(counters[name]) | ||
|
@@ -129,37 +134,17 @@ def decode_from_annotation(from_name, results): | |
|
||
|
||
def save_brush_images_from_annotation( | ||
task_id, | ||
annotation_id, | ||
completed_by, | ||
from_name, | ||
image_name, | ||
results, | ||
out_dir, | ||
out_format="numpy", | ||
): | ||
layers = decode_from_annotation(from_name, results) | ||
if isinstance(completed_by, dict): | ||
email = completed_by.get("email", "") | ||
else: | ||
email = str(completed_by) | ||
email = "".join( | ||
x for x in email if x.isalnum() or x == "@" or x == "." | ||
) # sanitize filename | ||
layers = decode_from_annotation(results) | ||
image_base = ".".join(image_name.split('.')[0:-1]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you have multiple annotations per one tasks - you will export only the last one, others will be overwritten. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have used the PR of label-studio-sdk to label many images that contain multiple annotations per task and this has worked through hundreds of images. By the way, the segment anything integration with label-studio is a game changer when it comes to efficiently labeling images in a production environment. To verify merges haven't broken anything, I pulled the latest label-studio and label-studio-sdk from the PR repo and verified with all updates functionality is preserved. I just created a simple project to demonstrate to avoid sharing any proprietary images (see attached). Inside the "decode_from_annotation" function there's an included sequence counter that is incremented and appended to the filename at the end of this function in the "counters[name]". This creates a unique filename for multiple annotations in a single task. In the attached example, all images are stored in a separate folder. If you look at the filename for the example image it starts with a base of "c2825893-NathonFamous". Every mask of class "Hotdog" will then be named according to "c2825893-NathonFamous-Hotdog-X.png" where X is the sequence number. Example export image in the "images" folder of an export: Image "c2825893-NathonFamous.png": Example export masks in the "masks" folder of an export (each mask filename has a unique sequence name per class): Mask Image "c2825893-NathonFamous-Hotdog-0.png": Mask Image "c2825893-NathonFamous-Hotdog-1.png": Mask Image "c2825893-NathonFamous-Hotdog-2.png" Mask Image "c2825893-NathonFamous-Not Hotdog-0.png" Mask Image "c2825893-NathonFamous-Not Hotdog-1.png" |
||
|
||
for name in layers: | ||
sanitized_name = name.replace("/", "-").replace("\\", "-") | ||
|
||
filename = os.path.join( | ||
out_dir, | ||
"task-" | ||
+ str(task_id) | ||
+ "-annotation-" | ||
+ str(annotation_id) | ||
+ "-by-" | ||
+ "-" | ||
+ sanitized_name, | ||
) | ||
filename = os.path.join(out_dir,image_base+"-"+sanitized_name) | ||
image = layers[name] | ||
logger.debug(f"Save image to {filename}") | ||
if out_format == "numpy": | ||
|
@@ -175,10 +160,7 @@ def convert_task(item, out_dir, out_format="numpy"): | |
"""Task with multiple annotations to brush images, out_format = numpy | png""" | ||
for from_name, results in item["output"].items(): | ||
save_brush_images_from_annotation( | ||
item["id"], | ||
item["annotation_id"], | ||
item["completed_by"], | ||
from_name, | ||
os.path.basename(item["input"]["image"]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what if you have another There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I'm not following what you mean. When I loaded the value of the dictionary in item["input"] I get the following: Is there a better place to pull the image name? I'm having troubles tracing back the definition of the input data. |
||
results, | ||
out_dir, | ||
out_format, | ||
|
@@ -193,6 +175,89 @@ def convert_task_dir(items, out_dir, out_format="numpy"): | |
|
||
# convert_task_dir('/ls/test/completions', '/ls/test/completions/output', 'numpy') | ||
|
||
def convert_to_brush( | ||
self, | ||
input_data, | ||
output_dir, | ||
output_image_dir=None, | ||
output_label_dir=None, | ||
is_dir=True, | ||
out_format="png", | ||
): | ||
"""Convert data in a specific format to either PNG or Numpy format. | ||
|
||
Parameters | ||
---------- | ||
input_data : str | ||
The input data a directory. | ||
output_dir : str | ||
The directory to store the output files in. | ||
output_image_dir : str, optional | ||
The directory to store the image files in. If not provided, it will default to a subdirectory called 'images' in output_dir. | ||
output_label_dir : str, optional | ||
The directory to store the label files in. If not provided, it will default to a subdirectory called 'masks' in output_dir. | ||
is_dir : bool, optional | ||
A boolean indicating whether `input_data` is a directory (True) or a JSON file (False). | ||
output_format : str, optional | ||
A string either 'png' or 'numpy' indicating which mask format to use. | ||
""" | ||
ensure_dir(output_dir) | ||
if output_image_dir is not None: | ||
ensure_dir(output_image_dir) | ||
else: | ||
output_image_dir = os.path.join(output_dir, "images") | ||
os.makedirs(output_image_dir, exist_ok=True) | ||
if output_label_dir is not None: | ||
ensure_dir(output_label_dir) | ||
else: | ||
output_label_dir = os.path.join(output_dir, "masks") | ||
os.makedirs(output_label_dir, exist_ok=True) | ||
categories, category_name_to_id = self._get_labels() | ||
data_key = self._data_keys[0] | ||
|
||
# Write all segmentation PNGs or Numpy masks | ||
items = ( | ||
self.iter_from_dir(input_data) | ||
if is_dir | ||
else self.iter_from_json_file(input_data) | ||
) | ||
convert_task_dir(items, output_label_dir, out_format) | ||
|
||
# Write all raw images to the "images" folder | ||
item_iterator = ( | ||
self.iter_from_dir(input_data) | ||
if is_dir | ||
else self.iter_from_json_file(input_data) | ||
) | ||
for item_idx, item in enumerate(item_iterator): | ||
# get image path(s) and label file path | ||
image_paths = item["input"][data_key] | ||
image_paths = [image_paths] if isinstance(image_paths, str) else image_paths | ||
# download image(s) | ||
image_path = None | ||
# TODO: for multi-page annotation, this code won't produce correct relationships between page and annotated shapes | ||
# fixing the issue in RND-84 | ||
for image_path in reversed(image_paths): | ||
if not os.path.exists(image_path): | ||
try: | ||
image_path = download( | ||
image_path, | ||
output_image_dir, | ||
project_dir=self.project_dir, | ||
return_relative_path=True, | ||
upload_dir=self.upload_dir, | ||
download_resources=self.download_resources, | ||
) | ||
except: | ||
logger.info( | ||
"Unable to download {image_path}. The item {item} will be skipped".format( | ||
image_path=image_path, item=item | ||
), | ||
exc_info=True, | ||
) | ||
if not image_path: | ||
logger.error(f"No image path found for item #{item_idx}") | ||
continue | ||
|
||
### Brush Import ### | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have multiple brushlabel control tags, your brushlabels will contain only one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included an example in the next comment where I used the attached labeling interface (XML below). I was able to define two brushlabel classes and exported masks contained both classes. The first 3 exported image masks were of type "Hotdog" and the last 4 exported image masks were of type "Not Hotdog". I've used keypoints, rectangles, and brush labels to annotate and all were exported as masks with the proper class name.
I'm not sure if this is what you mean though by multiple brushlabel control tags.