Label Studio Guide: Difference between revisions

From LogicalDOC Community Wiki
Jump to navigationJump to search
Giuseppe (talk | contribs)
No edit summary
Giuseppe (talk | contribs)
 
(46 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Preparing a Dataset with Label Studio =
= Label Studio Guide =


This guide explains how to create an annotated dataset for YOLO training using Label Studio.
This guide explains how to create an annotated dataset for YOLO training using Label Studio.


Install Label Studio using pip:
{{Advice| This guide describes an example workflow for training a custom YOLO model and preparing it for use with LogicalDOC.
<b><u>Please be aware that this procedure is not coverded by the standard support contract</u></b>. LogicalDOC cannot provide assistance with issues related to dataset preparation, training failures, model quality, GPU configuration, or third-party tools such as Label Studio, Ultralytics YOLO, or ONNX Runtime.
If you require professional assistance, please contact  <b>sales@logicaldoc.com</b> to request a quotation for consulting services.}}
 
== Install Label Studio ==
 
Label Studio requires Python 3.10 or later.
 
Install Label Studio using `pip`:


<pre>
<pre>
Line 15: Line 23:
</pre>
</pre>


Or refer to the official installation guide:
For additional installation options, refer to the official documentation:
 
https://labelstud.io/guide/install
https://labelstud.io/guide/install


== Start Label Studio ==


=== Enable Local File Storage ===
Label Studio can be started in one of the following ways.
 
For large projects it is not recommended to upload images directly through the Label Studio interface. Instead, configure a local directory that contains the images to annotate.
 
To enable local file access, configure the following environment variables before starting Label Studio:
 
<pre>
LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images
</pre>
 
 
 
 
=== Starting Label Studio ===


Label Studio can be started using one of the following methods.
=== Default Startup ===


 
For small datasets or proof-of-concept projects, Label Studio can be started with the default configuration:
==== Default Startup ====
 
If local file storage is not required, Label Studio can be started with the default configuration:


<pre>
<pre>
Line 58: Line 51:
</pre>
</pre>


==== Startup with Local File Storage ====
=== Start Label Studio with Local File Storage Enabled ===


When working with large datasets, it is recommended to configure Local Storage so that images are accessed directly from the filesystem.
For larger datasets, it is recommended to use Local Storage so that images are accessed directly from the filesystem instead of being uploaded through the Label Studio interface.


Windows example:
To enable support for Local Storage, configure the following environment variables before starting Label Studio:
 
<pre>
LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images
</pre>
 
Example (Windows):


<pre>
<pre>
Line 78: Line 78:
</pre>
</pre>


The directory specified by '''LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT''' can then be configured as Local Storage within a Label Studio project.
The value of <code>LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT</code> defines the root directory from which Label Studio is allowed to access local files. The Local Storage itself will be configured later when creating the project.


Port '''8081''' is used to avoid conflicts with the default LogicalDOC installation, which typically runs on port '''8080'''.
Port <code>8081</code> is used in this example to avoid conflicts with the default LogicalDOC installation, which typically runs on port <code>8080</code>.


 
== Create a Project ==
=== Create a Project ===


# Login to Label Studio
# Login to Label Studio
Line 91: Line 90:
# Save the project
# Save the project


=== Import Images ===
# Open the project
# Click '''Import'''
# Select '''Local Storage'''


=== Configure Local Storage ===
=== Configure and Synchronize Local Storage ===


# Open the project
# Open the project
Line 107: Line 101:




[[File:local-storage-button.png|thumb|800px|center|Local Storage button]]
[[File:local-storage-button.png|thumb|800px|center|Add Source Storage Selection]]
[[File:Storage-Settings-Label-Studio.png|thumb|800px|center|Local Storage Selection]]
[[File:Storage-Settings-Label-Studio.png|thumb|800px|center|Local Storage Selection]]
[[File:LabelStudio-local-storage.png.png|thumb|800px|center|Local Storage configuration showing a synchronized directory of document images]]


When importing images, choose '''Files''' as the import method.
[[File:Label-studio-importing-files.png|thumb|900px|center|Label-Studio File Type Selection]]


After synchronization, Label Studio automatically creates one task for each imported document image.
After synchronization, Label Studio automatically creates one task for each imported document image.


When importing images, choose '''Files''' as the import method.
[[File:LabelStudio-local-storage.png.png|thumb|800px|center|Local Storage configuration]]


=== Annotate Documents ===
== Annotate Documents ==


# Open a task
# Open a task
Line 123: Line 123:
# Save the annotation
# Save the annotation


Example labels:
Example:


* Invoice Number
[[File:label-studio-annotated-image-example.png|thumb|700px|center|Label-Studio Annotation Example]]
* Date
* Seller Name
* Buyer Name
* Total Amount
 
[[File:label-studio-annotated-image-example.png|thumb|600px|center|Example annotation]]


=== Export the Dataset ===
=== Export the Dataset ===
Line 138: Line 132:
# Click '''Export'''
# Click '''Export'''
# Select the desired format
# Select the desired format
[[File:label-studio-export.png|thumb|800px|center|Export Button Selection]]
[[File:label-studio-export-data-selection.png|thumb|600px|center|Export Data Selection]]


Supported formats include:
Supported formats include:
Line 146: Line 143:
* CSV
* CSV


For YOLO training, export the dataset in YOLO format.
For YOLO training, export the dataset in YOLO format, or YOLO with Images.
 
{{Note|
<b>Known Issue:</b> Even when selecting the <b>YOLO with Images</b> export format, Label Studio exports only the annotation (<code>.txt</code>) files. The corresponding images are not included in the exported archive.
 
Before starting the training process, copy the original images manually into the appropriate dataset directories.
}}
 
 
For information about the expected dataset structure, see [[YOLO Training Pipeline]].


=== Dataset Formats ===
== Dataset Formats ==


==== COCO ====
==== COCO ====


COCO is a JSON-based dataset format commonly used for object detection datasets.
COCO is a JSON-based dataset format commonly used for object detection datasets. It stores images, categories, annotations, and bounding boxes in a single JSON file.


More information:
More information:
https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html
https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html
==== Pascal VOC XML ====
Pascal VOC is an XML-based dataset format widely used in object detection tasks. Each image has a corresponding XML file containing metadata such as image dimensions, object classes, and bounding box coordinates.
More information:
https://roboflow.com/formats/pascal-voc-xml


==== YOLO ====
==== YOLO ====


YOLO datasets contain images and annotation files organized according to a predefined directory structure.
YOLO datasets consist of images and text annotation files organized according to a predefined directory structure. Each image has a corresponding text file containing the object class and normalized bounding box coordinates.


More information:
More information:
Line 166: Line 179:
==== YOLOv8 OBB ====
==== YOLOv8 OBB ====


YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates.
YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates instead of four. This format is useful when objects are not aligned horizontally.

Latest revision as of 07:46, 26 June 2026

Label Studio Guide

This guide explains how to create an annotated dataset for YOLO training using Label Studio.


This guide describes an example workflow for training a custom YOLO model and preparing it for use with LogicalDOC.

Please be aware that this procedure is not coverded by the standard support contract. LogicalDOC cannot provide assistance with issues related to dataset preparation, training failures, model quality, GPU configuration, or third-party tools such as Label Studio, Ultralytics YOLO, or ONNX Runtime.

If you require professional assistance, please contact sales@logicaldoc.com to request a quotation for consulting services.


Install Label Studio

Label Studio requires Python 3.10 or later.

Install Label Studio using `pip`:

pip install label-studio

Verify the installation:

python -m label_studio.server --help

For additional installation options, refer to the official documentation:

https://labelstud.io/guide/install

Start Label Studio

Label Studio can be started in one of the following ways.

Default Startup

For small datasets or proof-of-concept projects, Label Studio can be started with the default configuration:

label-studio start

or

python -m label_studio.server start

By default, the application is available at:

http://localhost:8080

Start Label Studio with Local File Storage Enabled

For larger datasets, it is recommended to use Local Storage so that images are accessed directly from the filesystem instead of being uploaded through the Label Studio interface.

To enable support for Local Storage, configure the following environment variables before starting Label Studio:

LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images

Example (Windows):

set LABEL_STUDIO_PORT=8081
set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\Users\username\Documents\label-studio

python -m label_studio.server start

After startup, Label Studio will be available at:

http://localhost:8081

The value of LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT defines the root directory from which Label Studio is allowed to access local files. The Local Storage itself will be configured later when creating the project.

Port 8081 is used in this example to avoid conflicts with the default LogicalDOC installation, which typically runs on port 8080.

Create a Project

  1. Login to Label Studio
  2. Click Create Project
  3. Enter a project name
  4. Configure the labeling interface
  5. Save the project


Configure and Synchronize Local Storage

  1. Open the project
  2. Navigate to Settings > Cloud Storage
  3. Click Add Source Storage
  4. Select Local Files
  5. Configure the path specified by LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT
  6. Click Sync Storage


Add Source Storage Selection
Local Storage Selection



When importing images, choose Files as the import method.


Label-Studio File Type Selection

After synchronization, Label Studio automatically creates one task for each imported document image.

Local Storage configuration

Annotate Documents

  1. Open a task
  2. Select a label
  3. Draw a bounding box around the target area
  4. Save the annotation

Example:

Label-Studio Annotation Example

Export the Dataset

  1. Open the project
  2. Click Export
  3. Select the desired format
Export Button Selection
Export Data Selection

Supported formats include:

  • YOLO
  • COCO
  • Pascal VOC
  • CSV

For YOLO training, export the dataset in YOLO format, or YOLO with Images.


Known Issue: Even when selecting the YOLO with Images export format, Label Studio exports only the annotation (.txt) files. The corresponding images are not included in the exported archive.

Before starting the training process, copy the original images manually into the appropriate dataset directories.



For information about the expected dataset structure, see YOLO Training Pipeline.

Dataset Formats

COCO

COCO is a JSON-based dataset format commonly used for object detection datasets. It stores images, categories, annotations, and bounding boxes in a single JSON file.

More information: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html

Pascal VOC XML

Pascal VOC is an XML-based dataset format widely used in object detection tasks. Each image has a corresponding XML file containing metadata such as image dimensions, object classes, and bounding box coordinates.

More information: https://roboflow.com/formats/pascal-voc-xml

YOLO

YOLO datasets consist of images and text annotation files organized according to a predefined directory structure. Each image has a corresponding text file containing the object class and normalized bounding box coordinates.

More information: https://docs.cvat.ai/docs/dataset_management/formats/format-yolo/

YOLOv8 OBB

YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates instead of four. This format is useful when objects are not aligned horizontally.