Java Options and Label Studio Guide: Difference between pages

From LogicalDOC Community Wiki
(Difference between pages)
Jump to navigationJump to search
 
Giuseppe (talk | contribs)
No edit summary
 
Line 1: Line 1:
If you launch from a shell the command: java -? <br>this will return the list of available options for the command.
= Preparing a Dataset with Label Studio =


By running the command: java -X<br>instead will be shown a list of non-standard options.<br>Note: This list may vary from version to version of Java and is subject to change (between versions) without warning or notice.  
This guide explains how to create an annotated dataset for YOLO training using Label Studio.


Options that are specified with -XX are not stable and are not recommended for casual use. <br>These options are subject to change without notice.
Install Label Studio using pip:


{| cellspacing="1" cellpadding="1" border="1" style="width: 671px; height: 240px;"
<pre>
|-
pip install label-studio
| -Xmx900m <br>
</pre>
| maximum Java heap size (900 megabytes)<br>
|-
| -XX:MaxPermSize=384m<br>
| maximum Size of the Permanent Generation (384 megabytes)<br>
|-
| -Xms512m<br>
| initial Java heap size<br>
|-
| -XX:MaxNewSize=24m<br>
| Maximum size of new generation<br>
|-
| -XX:NewSize=24m<br>
| Default size of new generation <br>
|-
| -XX:+UseParNewGC<br>
| Parallel Young Generation garbage collector (for multiprocessor machines)<br>
|-
| -XX:+CMSParallelRemarkEnabled<br>
| Attempt to descrease remark pauses when used with -XX:+UseParNewGC<br>
|-
| -XX:+UseConcMarkSweepGC<br>  
| Enable the concurrent low pause collector <br>
|-
| -Dorg.apache.el.parser.COERCE_TO_ZERO=false <br>
| If true, when coercing expressions to numbers "" and null will be coerced to zero as required by the specification. If not specified, the default value of true will be used.<br>
|}


<br>
Verify the installation:


Note: Many of the options presented are related to JVM 1.4 in particular, refer to the following URLs:<br>[http://java.sun.com/docs/hotspot/gc1.4.2/faq.html http://java.sun.com/docs/hotspot/gc1.4.2/faq.html]<br>[http://www.petefreitag.com/articles/gctuning/ http://www.petefreitag.com/articles/gctuning/]
<pre>
python -m label_studio.server --help
</pre>


Should therefore be careful when using these options on a JVM 1.6.<br>For the latest list of available options for the J2SE 6.0 refer to the following documentation:<br>
Or refer to the official installation guide:
https://labelstud.io/guide/install


=== JVM 1.6 - J2SE 6  ===


Java HotSpot VM Options<br>[http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp]<br>Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning<br>[http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html]<br>[http://java.sun.com/performance/reference/whitepapers/6_performance.html http://java.sun.com/performance/reference/whitepapers/6_performance.html]
=== Enable Local File Storage ===


[http://collab.sakaiproject.org/pipermail/sakai-dev/2009-September/003682.html http://collab.sakaiproject.org/pipermail/sakai-dev/2009-September/003682.html]<br>
For large projects it is not recommended to upload images directly through the Label Studio interface. Instead, configure a local directory that contains the images to annotate.


<br>
To enable local file access, configure the following environment variables before starting Label Studio:


<br>
<pre>
LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images
</pre>
 
 
 
 
=== Starting Label Studio ===
 
Label Studio can be started using one of the following methods.
 
 
==== Default Startup ====
 
If local file storage is not required, Label Studio can be started with the default configuration:
 
<pre>
label-studio start
</pre>
 
or
 
<pre>
python -m label_studio.server start
</pre>
 
By default, the application is available at:
 
<pre>
http://localhost:8080
</pre>
 
==== Startup with Local File Storage ====
 
When working with large datasets, it is recommended to configure Local Storage so that images are accessed directly from the filesystem.
 
Windows example:
 
<pre>
set LABEL_STUDIO_PORT=8081
set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\Users\username\Documents\label-studio
 
python -m label_studio.server start
</pre>
 
After startup, Label Studio will be available at:
 
<pre>
http://localhost:8081
</pre>
 
The directory specified by '''LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT''' can then be configured as Local Storage within a Label Studio project.
 
Port '''8081''' is used to avoid conflicts with the default LogicalDOC installation, which typically runs on port '''8080'''.
 
 
=== Create a Project ===
 
# Login to Label Studio
# Click '''Create Project'''
# Enter a project name
# Configure the labeling interface
# Save the project
 
=== Import Images ===
 
# Open the project
# Click '''Import'''
# Select '''Local Storage'''
 
=== Configure Local Storage ===
 
# Open the project
# Navigate to '''Settings > Cloud Storage'''
# Click '''Add Source Storage'''
# Select '''Local Files'''
# Configure the path specified by LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT
# Click '''Sync Storage'''
 
 
[[File:local-storage-button.png|thumb|800px|center|Local Storage button]]
[[File:Storage-Settings-Label-Studio.png|thumb|800px|center|Local Storage Selection]]
[[File:LabelStudio-local-storage.png.png|thumb|800px|center|Local Storage configuration showing a synchronized directory of document images]]
 
 
After synchronization, Label Studio automatically creates one task for each imported document image.
 
When importing images, choose '''Files''' as the import method.
 
=== Annotate Documents ===
 
# Open a task
# Select a label
# Draw a bounding box around the target area
# Save the annotation
 
Example labels:
 
* Invoice Number
* Date
* Seller Name
* Buyer Name
* Total Amount
 
[[File:label-studio-annotated-image-example.png|thumb|600px|center|Example annotation]]
 
=== Export the Dataset ===
 
# Open the project
# Click '''Export'''
# Select the desired format
 
Supported formats include:
 
* YOLO
* COCO
* Pascal VOC
* CSV
 
For YOLO training, export the dataset in YOLO format.
 
=== Dataset Formats ===
 
==== COCO ====
 
COCO is a JSON-based dataset format commonly used for object detection datasets.
 
More information:
https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html
 
==== YOLO ====
 
YOLO datasets contain images and annotation files organized according to a predefined directory structure.
 
More information:
https://docs.cvat.ai/docs/dataset_management/formats/format-yolo/
 
==== YOLOv8 OBB ====
 
YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates.

Revision as of 13:25, 23 June 2026

Preparing a Dataset with Label Studio

This guide explains how to create an annotated dataset for YOLO training using Label Studio.

Install Label Studio using pip:

pip install label-studio

Verify the installation:

python -m label_studio.server --help

Or refer to the official installation guide: https://labelstud.io/guide/install


Enable Local File Storage

For large projects it is not recommended to upload images directly through the Label Studio interface. Instead, configure a local directory that contains the images to annotate.

To enable local file access, configure the following environment variables before starting Label Studio:

LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images



Starting Label Studio

Label Studio can be started using one of the following methods.


Default Startup

If local file storage is not required, Label Studio can be started with the default configuration:

label-studio start

or

python -m label_studio.server start

By default, the application is available at:

http://localhost:8080

Startup with Local File Storage

When working with large datasets, it is recommended to configure Local Storage so that images are accessed directly from the filesystem.

Windows example:

set LABEL_STUDIO_PORT=8081
set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\Users\username\Documents\label-studio

python -m label_studio.server start

After startup, Label Studio will be available at:

http://localhost:8081

The directory specified by LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT can then be configured as Local Storage within a Label Studio project.

Port 8081 is used to avoid conflicts with the default LogicalDOC installation, which typically runs on port 8080.


Create a Project

  1. Login to Label Studio
  2. Click Create Project
  3. Enter a project name
  4. Configure the labeling interface
  5. Save the project

Import Images

  1. Open the project
  2. Click Import
  3. Select Local Storage

Configure Local Storage

  1. Open the project
  2. Navigate to Settings > Cloud Storage
  3. Click Add Source Storage
  4. Select Local Files
  5. Configure the path specified by LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT
  6. Click Sync Storage


Local Storage button
Local Storage Selection
Local Storage configuration showing a synchronized directory of document images


After synchronization, Label Studio automatically creates one task for each imported document image.

When importing images, choose Files as the import method.

Annotate Documents

  1. Open a task
  2. Select a label
  3. Draw a bounding box around the target area
  4. Save the annotation

Example labels:

  • Invoice Number
  • Date
  • Seller Name
  • Buyer Name
  • Total Amount
Example annotation

Export the Dataset

  1. Open the project
  2. Click Export
  3. Select the desired format

Supported formats include:

  • YOLO
  • COCO
  • Pascal VOC
  • CSV

For YOLO training, export the dataset in YOLO format.

Dataset Formats

COCO

COCO is a JSON-based dataset format commonly used for object detection datasets.

More information: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html

YOLO

YOLO datasets contain images and annotation files organized according to a predefined directory structure.

More information: https://docs.cvat.ai/docs/dataset_management/formats/format-yolo/

YOLOv8 OBB

YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates.