Label Studio Guide

From LogicalDOC Community Wiki
Revision as of 07:34, 26 June 2026 by Giuseppe (talk | contribs)
Jump to navigationJump to search

Label Studio Guide

This guide explains how to create an annotated dataset for YOLO training using Label Studio.


Please be aware that this procedure is not coverded by the standard support contract. LogicalDOC cannot provide assistance with issues related to dataset preparation, training failures, model quality, GPU configuration, or third-party tools such as Label Studio, Ultralytics YOLO, or ONNX Runtime.

If you require professional assistance, please contact sales@logicaldoc.com to request a quotation for consulting services.



Install Label Studio

Label Studio requires Python 3.10 or later.

Install Label Studio using `pip`:

pip install label-studio

Verify the installation:

python -m label_studio.server --help

For additional installation options, refer to the official documentation:

https://labelstud.io/guide/install

Start Label Studio

Label Studio can be started in one of the following ways.

Default Startup

For small datasets or proof-of-concept projects, Label Studio can be started with the default configuration:

label-studio start

or

python -m label_studio.server start

By default, the application is available at:

http://localhost:8080

Start Label Studio with Local File Storage Enabled

For larger datasets, it is recommended to use Local Storage so that images are accessed directly from the filesystem instead of being uploaded through the Label Studio interface.

To enable support for Local Storage, configure the following environment variables before starting Label Studio:

LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images

Example (Windows):

set LABEL_STUDIO_PORT=8081
set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\Users\username\Documents\label-studio

python -m label_studio.server start

After startup, Label Studio will be available at:

http://localhost:8081

The value of LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT defines the root directory from which Label Studio is allowed to access local files. The Local Storage itself will be configured later when creating the project.

Port 8081 is used in this example to avoid conflicts with the default LogicalDOC installation, which typically runs on port 8080.

Create a Project

  1. Login to Label Studio
  2. Click Create Project
  3. Enter a project name
  4. Configure the labeling interface
  5. Save the project


Configure and Synchronize Local Storage

  1. Open the project
  2. Navigate to Settings > Cloud Storage
  3. Click Add Source Storage
  4. Select Local Files
  5. Configure the path specified by LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT
  6. Click Sync Storage


Add Source Storage Selection
Local Storage Selection



When importing images, choose Files as the import method.


Label-Studio File Type Selection

After synchronization, Label Studio automatically creates one task for each imported document image.

Local Storage configuration

Annotate Documents

  1. Open a task
  2. Select a label
  3. Draw a bounding box around the target area
  4. Save the annotation

Example:

Label-Studio Annotation Example

Export the Dataset

  1. Open the project
  2. Click Export
  3. Select the desired format
Export Button Selection
Export Data Selection

Supported formats include:

  • YOLO
  • COCO
  • Pascal VOC
  • CSV

For YOLO training, export the dataset in YOLO format, or YOLO with Images.


KNOWN ISSUE

Even when selecting the YOLO with Images export format, Label Studio exports only the annotation (.txt) files. The corresponding images are not included in the exported archive.

Before starting the training process, copy the original images manually into the appropriate dataset directories.

For information about the expected dataset structure, see YOLO Training Pipeline.

Dataset Formats

COCO

COCO is a JSON-based dataset format commonly used for object detection datasets. It stores images, categories, annotations, and bounding boxes in a single JSON file.

More information: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html

Pascal VOC XML

Pascal VOC is an XML-based dataset format widely used in object detection tasks. Each image has a corresponding XML file containing metadata such as image dimensions, object classes, and bounding box coordinates.

More information: https://roboflow.com/formats/pascal-voc-xml

YOLO

YOLO datasets consist of images and text annotation files organized according to a predefined directory structure. Each image has a corresponding text file containing the object class and normalized bounding box coordinates.

More information: https://docs.cvat.ai/docs/dataset_management/formats/format-yolo/

YOLOv8 OBB

YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates instead of four. This format is useful when objects are not aligned horizontally.