Store filenames with emoticons: Difference between revisions

From LogicalDOC Community Wiki
Jump to navigationJump to search
Giuseppe (talk | contribs)
Undo revision 5353 by Giuseppe (talk)
Tag: Undo
Giuseppe (talk | contribs)
No edit summary
Tag: Manual revert
 
Line 1: Line 1:
= Preparing a Dataset with Label Studio =
You must have a MySQL 5.7-8/MariaDB database with utf8md4 settings


This guide explains how to create an annotated dataset for YOLO training using Label Studio.
Since this is not a risk-free operation it is necessary to back up the database before proceeding.


Install Label Studio using pip:
=== Update the configuration of LogicalDOC ===
First you have to add a couple of parameters to the jdbc connection url of LogicalDOC
# Shutdown LogicalDOC system service/daemon
# Locate the file '''context.properties''' in '''<LOGICALDOC_INSTALLATION_DIR>/conf'''
# Edit the file context.properties by adding a couple of parameters to the value of key jdbc.url
# Locate the key '''jdbc.url''' and add the parameters '''characterEncoding=utf8&allowPublicKeyRetrieval=true''' <br/><pre>e.g.: jdbc.url=jdbc:mysql://localhost:3306/logicaldoc?useSSL=false&characterEncoding=utf8&allowPublicKeyRetrieval=true</pre>
# Save the file and restart LogicalDOC system service/daemon


=== Update the database schema ===
Connect to the database using mysql client
<pre>
<pre>
pip install label-studio
mysql -u root -p logicaldoc
</pre>
</pre>


Verify the installation:
Check the status of the charset settings are OK with the query below
<syntaxhighlight lang="SQL">
SHOW VARIABLES LIKE 'char%';
</syntaxhighlight>


<pre>
[[File:Mysql8-charset-for-emoticons.png|360px|frame|center|MySQL queries to check charset and collation to store emoticons in LogicalDOC filenames]]
python -m label_studio.server --help
</pre>
 
Or refer to the official installation guide:
https://labelstud.io/guide/install
 
 
=== Enable Local File Storage ===
 
For large projects it is not recommended to upload images directly through the Label Studio interface. Instead, configure a local directory that contains the images to annotate.
 
To enable local file access, configure the following environment variables before starting Label Studio:
 
<pre>
LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/path/to/images
</pre>
 
 
 
 
=== Starting Label Studio ===
 
Label Studio can be started using one of the following methods.
 
 
==== Default Startup ====
 
If local file storage is not required, Label Studio can be started with the default configuration:
 
<pre>
label-studio start
</pre>
 
or
 
<pre>
python -m label_studio.server start
</pre>
 
By default, the application is available at:
 
<pre>
http://localhost:8080
</pre>
 
==== Startup with Local File Storage ====
 
When working with large datasets, it is recommended to configure Local Storage so that images are accessed directly from the filesystem.
 
Windows example:
 
<pre>
set LABEL_STUDIO_PORT=8081
set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
set LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\Users\username\Documents\label-studio
 
python -m label_studio.server start
</pre>
 
After startup, Label Studio will be available at:
 
<pre>
http://localhost:8081
</pre>
 
The directory specified by '''LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT''' can then be configured as Local Storage within a Label Studio project.
 
Port '''8081''' is used to avoid conflicts with the default LogicalDOC installation, which typically runs on port '''8080'''.
 
 
=== Create a Project ===
 
# Login to Label Studio
# Click '''Create Project'''
# Enter a project name
# Configure the labeling interface
# Save the project
 
=== Import Images ===
 
# Open the project
# Click '''Import'''
# Select '''Local Storage'''
 
=== Configure Local Storage ===
 
# Open the project
# Navigate to '''Settings > Cloud Storage'''
# Click '''Add Source Storage'''
# Select '''Local Files'''
# Configure the path specified by LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT
# Click '''Sync Storage'''
 
After synchronization, Label Studio automatically creates one task for each imported document image.
 
When importing images, choose '''Files''' as the import method.
 
Unlike CVAT, Label Studio creates one task for each imported document image.
 
[[File:LabelStudio-import-method.png|thumb|600px|center|Selecting the Files import method]]
 
=== Annotate Documents ===
 
# Open a task
# Select a label
# Draw a bounding box around the target area
# Save the annotation
 
Example labels:
 
* Invoice Number
* Date
* Seller Name
* Buyer Name
* Total Amount
 
[[File:LabelStudio-annotation-example.png|thumb|600px|center|Example annotation]]
 
=== Export the Dataset ===
 
# Open the project
# Click '''Export'''
# Select the desired format
 
Supported formats include:
 
* YOLO
* COCO
* Pascal VOC
* CSV
 
For YOLO training, export the dataset in YOLO format.
 
=== Dataset Formats ===


==== COCO ====


COCO is a JSON-based dataset format commonly used for object detection datasets.
Here you should '''check that character_set_server is set to utf8mb4'''<br>
If it is not set to utf8mb4 you must backup the database schema of logicaldoc and change that<br>
For more information [https://stackoverflow.com/questions/44591895/utf8mb4-in-mysql-workbench-and-jdbc utf8mb4 in MySQL Workbench and JDBC]


More information:
Execute the following SQL statements on the relevant field of the tables: ld_document, ld_version, ld_history
https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html
<syntaxhighlight lang="SQL">
ALTER TABLE
    ld_document
    CHANGE ld_filename ld_filename
    VARCHAR(255)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;


==== YOLO ====
ALTER TABLE
    ld_version
    CHANGE ld_filename ld_filename
    VARCHAR(255)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;


YOLO datasets contain images and annotation files organized according to a predefined directory structure.
ALTER TABLE
    ld_history
    CHANGE ld_filename ld_filename
    VARCHAR(255)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;
</syntaxhighlight>


More information:
=== Add some files with emoticons in LogicalDOC ===
https://docs.cvat.ai/docs/dataset_management/formats/format-yolo/
Now you can upload some files with emoticons from your desktop or just change the filename of a stored document adding an emoticon at the start of the File name
[[File:Document-browser-files-with-emoticons.png|thumb|600px|center|The document explorer in LogicalDOC DMS 8.3.3 showing some files with emoticons]]


==== YOLOv8 OBB ====
=== Additional information ===


YOLOv8 OBB (Oriented Bounding Boxes) extends the standard YOLO format by supporting rotated bounding boxes using eight normalized coordinates.
* [https://makandracards.com/makandra/2529-show-and-change-mysql-default-character-set Show and change MySQL default character set]
* [https://makandracards.com/makandra/2531-show-the-character-set-and-the-collation-of-your-mysql-tables Show the character set and the collation of your MySQL tables]
* [https://stackoverflow.com/questions/1049728/how-do-i-see-what-character-set-a-mysql-database-table-column-is How do I see what character set a MySQL database / table / column is?]
* [https://stackoverflow.com/questions/39463134/how-to-store-emoji-character-in-mysql-database How to store Emoji Character in MySQL Database]
* [https://stackoverflow.com/questions/44591895/utf8mb4-in-mysql-workbench-and-jdbc utf8mb4 in MySQL Workbench and JDBC]

Latest revision as of 12:55, 23 June 2026

You must have a MySQL 5.7-8/MariaDB database with utf8md4 settings

Since this is not a risk-free operation it is necessary to back up the database before proceeding.

Update the configuration of LogicalDOC

First you have to add a couple of parameters to the jdbc connection url of LogicalDOC

  1. Shutdown LogicalDOC system service/daemon
  2. Locate the file context.properties in <LOGICALDOC_INSTALLATION_DIR>/conf
  3. Edit the file context.properties by adding a couple of parameters to the value of key jdbc.url
  4. Locate the key jdbc.url and add the parameters characterEncoding=utf8&allowPublicKeyRetrieval=true
    e.g.: jdbc.url=jdbc:mysql://localhost:3306/logicaldoc?useSSL=false&characterEncoding=utf8&allowPublicKeyRetrieval=true
  5. Save the file and restart LogicalDOC system service/daemon

Update the database schema

Connect to the database using mysql client

mysql -u root -p logicaldoc

Check the status of the charset settings are OK with the query below

 SHOW VARIABLES LIKE 'char%';
MySQL queries to check charset and collation to store emoticons in LogicalDOC filenames


Here you should check that character_set_server is set to utf8mb4
If it is not set to utf8mb4 you must backup the database schema of logicaldoc and change that
For more information utf8mb4 in MySQL Workbench and JDBC

Execute the following SQL statements on the relevant field of the tables: ld_document, ld_version, ld_history

ALTER TABLE
    ld_document
    CHANGE ld_filename ld_filename
    VARCHAR(255)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

ALTER TABLE
    ld_version
    CHANGE ld_filename ld_filename
    VARCHAR(255)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

ALTER TABLE
    ld_history
    CHANGE ld_filename ld_filename
    VARCHAR(255)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

Add some files with emoticons in LogicalDOC

Now you can upload some files with emoticons from your desktop or just change the filename of a stored document adding an emoticon at the start of the File name

The document explorer in LogicalDOC DMS 8.3.3 showing some files with emoticons

Additional information