Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Export brush label masks with matching file base name #254

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
125 changes: 95 additions & 30 deletions src/label_studio_sdk/converter/brush.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@
from collections import defaultdict
from itertools import groupby

from label_studio_sdk.converter.utils import (
download,
ensure_dir,
)

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -99,7 +104,7 @@ def decode_rle(rle, print_params: bool = False):
return out


def decode_from_annotation(from_name, results):
def decode_from_annotation(results):
"""from LS annotation to {"tag_name + label_name": [numpy uint8 image (width x height)]}"""
layers = {}
counters = defaultdict(int)
Expand All @@ -116,7 +121,7 @@ def decode_from_annotation(from_name, results):
width = result["original_width"]
height = result["original_height"]
labels = result[key] if key in result else ["no_label"]
name = from_name + "-" + "-".join(labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have multiple brushlabel control tags, your brushlabels will contain only one of them.

Copy link
Author

@PhillipRDI PhillipRDI Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included an example in the next comment where I used the attached labeling interface (XML below). I was able to define two brushlabel classes and exported masks contained both classes. The first 3 exported image masks were of type "Hotdog" and the last 4 exported image masks were of type "Not Hotdog". I've used keypoints, rectangles, and brush labels to annotate and all were exported as masks with the proper class name.

I'm not sure if this is what you mean though by multiple brushlabel control tags.

<View>
  <Image name="image" value="$image" zoom="true"/>
  <Header value="Keypoint Labels"/>
  <KeyPointLabels name="tag2" toName="image" smart="true">
    <Label value="Not Hotdog" smart="true" background="#00FF00" showInline="true"/>
    <Label value="Hotdog" smart="true" background="#FF0000" showInline="true"/>
  </KeyPointLabels>
  <Header value="Rectangle Labels"/>
  <RectangleLabels name="tag3" toName="image" smart="true">
    <Label value="Not Hotdog" smart="true" background="#00FF00" showInline="true"/>
    <Label value="Hotdog" smart="true" background="#FF0000" showInline="true"/>
  </RectangleLabels>
  <Header value="Brush Labels"/>
  <BrushLabels name="tag" toName="image">
    <Label value="Not Hotdog" smart="true" background="#00FF00" showInline="true"/>
    <Label value="Hotdog" smart="true" background="#FF0000" showInline="true"/>
  </BrushLabels>
</View>

name = "".join(labels)

# result count
i = str(counters[name])
Expand All @@ -129,37 +134,17 @@ def decode_from_annotation(from_name, results):


def save_brush_images_from_annotation(
task_id,
annotation_id,
completed_by,
from_name,
image_name,
results,
out_dir,
out_format="numpy",
):
layers = decode_from_annotation(from_name, results)
if isinstance(completed_by, dict):
email = completed_by.get("email", "")
else:
email = str(completed_by)
email = "".join(
x for x in email if x.isalnum() or x == "@" or x == "."
) # sanitize filename
layers = decode_from_annotation(results)
image_base = ".".join(image_name.split('.')[0:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have multiple annotations per one tasks - you will export only the last one, others will be overwritten.

Copy link
Author

@PhillipRDI PhillipRDI Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used the PR of label-studio-sdk to label many images that contain multiple annotations per task and this has worked through hundreds of images. By the way, the segment anything integration with label-studio is a game changer when it comes to efficiently labeling images in a production environment.

To verify merges haven't broken anything, I pulled the latest label-studio and label-studio-sdk from the PR repo and verified with all updates functionality is preserved. I just created a simple project to demonstrate to avoid sharing any proprietary images (see attached). Inside the "decode_from_annotation" function there's an included sequence counter that is incremented and appended to the filename at the end of this function in the "counters[name]". This creates a unique filename for multiple annotations in a single task.

In the attached example, all images are stored in a separate folder. If you look at the filename for the example image it starts with a base of "c2825893-NathonFamous". Every mask of class "Hotdog" will then be named according to "c2825893-NathonFamous-Hotdog-X.png" where X is the sequence number.

Example export image in the "images" folder of an export:

Image "c2825893-NathonFamous.png":
c2825893-NathonFamous

Example export masks in the "masks" folder of an export (each mask filename has a unique sequence name per class):

Mask Image "c2825893-NathonFamous-Hotdog-0.png":
c2825893-NathonFamous-Hotdog-0

Mask Image "c2825893-NathonFamous-Hotdog-1.png":
c2825893-NathonFamous-Hotdog-1

Mask Image "c2825893-NathonFamous-Hotdog-2.png"
c2825893-NathonFamous-Hotdog-2

Mask Image "c2825893-NathonFamous-Not Hotdog-0.png"
c2825893-NathonFamous-Not Hotdog-0

Mask Image "c2825893-NathonFamous-Not Hotdog-1.png"
c2825893-NathonFamous-Not Hotdog-1

Mask Image "c2825893-NathonFamous-Not Hotdog-2.png"
c2825893-NathonFamous-Not Hotdog-2

Mask Image "c2825893-NathonFamous-Not Hotdog-3.png"
c2825893-NathonFamous-Not Hotdog-3


for name in layers:
sanitized_name = name.replace("/", "-").replace("\\", "-")

filename = os.path.join(
out_dir,
"task-"
+ str(task_id)
+ "-annotation-"
+ str(annotation_id)
+ "-by-"
+ email
+ "-"
+ sanitized_name,
)
filename = os.path.join(out_dir,image_base+"-"+sanitized_name)
image = layers[name]
logger.debug(f"Save image to {filename}")
if out_format == "numpy":
Expand All @@ -175,10 +160,7 @@ def convert_task(item, out_dir, out_format="numpy"):
"""Task with multiple annotations to brush images, out_format = numpy | png"""
for from_name, results in item["output"].items():
save_brush_images_from_annotation(
item["id"],
item["annotation_id"],
item["completed_by"],
from_name,
os.path.basename(item["input"]["image"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you have another image field name in the task.data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not following what you mean. When I loaded the value of the dictionary in item["input"] I get the following:
item["input"] = {'image': '/data/upload/2/c2825893-NathonFamous.png'}

Is there a better place to pull the image name? I'm having troubles tracing back the definition of the input data.

results,
out_dir,
out_format,
Expand All @@ -193,6 +175,89 @@ def convert_task_dir(items, out_dir, out_format="numpy"):

# convert_task_dir('/ls/test/completions', '/ls/test/completions/output', 'numpy')

def convert_to_brush(
self,
input_data,
output_dir,
output_image_dir=None,
output_label_dir=None,
is_dir=True,
out_format="png",
):
"""Convert data in a specific format to either PNG or Numpy format.

Parameters
----------
input_data : str
The input data a directory.
output_dir : str
The directory to store the output files in.
output_image_dir : str, optional
The directory to store the image files in. If not provided, it will default to a subdirectory called 'images' in output_dir.
output_label_dir : str, optional
The directory to store the label files in. If not provided, it will default to a subdirectory called 'masks' in output_dir.
is_dir : bool, optional
A boolean indicating whether `input_data` is a directory (True) or a JSON file (False).
output_format : str, optional
A string either 'png' or 'numpy' indicating which mask format to use.
"""
ensure_dir(output_dir)
if output_image_dir is not None:
ensure_dir(output_image_dir)
else:
output_image_dir = os.path.join(output_dir, "images")
os.makedirs(output_image_dir, exist_ok=True)
if output_label_dir is not None:
ensure_dir(output_label_dir)
else:
output_label_dir = os.path.join(output_dir, "masks")
os.makedirs(output_label_dir, exist_ok=True)
categories, category_name_to_id = self._get_labels()
data_key = self._data_keys[0]

# Write all segmentation PNGs or Numpy masks
items = (
self.iter_from_dir(input_data)
if is_dir
else self.iter_from_json_file(input_data)
)
convert_task_dir(items, output_label_dir, out_format)

# Write all raw images to the "images" folder
item_iterator = (
self.iter_from_dir(input_data)
if is_dir
else self.iter_from_json_file(input_data)
)
for item_idx, item in enumerate(item_iterator):
# get image path(s) and label file path
image_paths = item["input"][data_key]
image_paths = [image_paths] if isinstance(image_paths, str) else image_paths
# download image(s)
image_path = None
# TODO: for multi-page annotation, this code won't produce correct relationships between page and annotated shapes
# fixing the issue in RND-84
for image_path in reversed(image_paths):
if not os.path.exists(image_path):
try:
image_path = download(
image_path,
output_image_dir,
project_dir=self.project_dir,
return_relative_path=True,
upload_dir=self.upload_dir,
download_resources=self.download_resources,
)
except:
logger.info(
"Unable to download {image_path}. The item {item} will be skipped".format(
image_path=image_path, item=item
),
exc_info=True,
)
if not image_path:
logger.error(f"No image path found for item #{item_idx}")
continue

### Brush Import ###

Expand Down
30 changes: 20 additions & 10 deletions src/label_studio_sdk/converter/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,19 +239,29 @@ def convert(self, input_data, output_data, format, is_dir=True, **kwargs):
input_data, output_data, output_image_dir=image_dir, is_dir=is_dir
)
elif format == Format.BRUSH_TO_NUMPY:
items = (
self.iter_from_dir(input_data)
if is_dir
else self.iter_from_json_file(input_data)
image_dir = kwargs.get("image_dir")
label_dir = kwargs.get("label_dir")
brush.convert_to_brush(
self,
input_data,
output_data,
output_image_dir=image_dir,
output_label_dir=label_dir,
is_dir=is_dir,
out_format="numpy",
)
brush.convert_task_dir(items, output_data, out_format="numpy")
elif format == Format.BRUSH_TO_PNG:
items = (
self.iter_from_dir(input_data)
if is_dir
else self.iter_from_json_file(input_data)
image_dir = kwargs.get("image_dir")
label_dir = kwargs.get("label_dir")
brush.convert_to_brush(
self,
input_data,
output_data,
output_image_dir=image_dir,
output_label_dir=label_dir,
is_dir=is_dir,
out_format="png",
)
brush.convert_task_dir(items, output_data, out_format="png")
elif format == Format.ASR_MANIFEST:
items = (
self.iter_from_dir(input_data)
Expand Down