THIS IS THE ADDON to [BOORU CHARS 2021](https://nyaa.si/view/1384820) and [BOORU CHARS 2015](https://nyaa.si/view/1468367) torrents
It covers ~98% newcoming images from composite rips
[01.2022 - 05.2022](https://nyaa.si/view/1539363) volume V2022B "double size"
[11.2021 - 01.2022](https://nyaa.si/view/1486179) volume V2022A
[08.2021 - 11.2021](https://nyaa.si/view/1462329) volume V2021D
[06.2021 - 08.2021](https://nyaa.si/view/1452049) volume V2021C
[03.2021 - 06.2021](https://nyaa.si/view/1409571) volume V2021B
**No substantial changes happened in image processing workflow, features stills the same:**
1) files unique identified with (booru + fid) imageboard name and file ID key
verbose file naming **%booru% - %fid% - %up-to-3-copyrights% ~ %up-to-5-characters% (%up-to-2-artists%)**
2) aspect ratio clustering (freeware Dimensions2Folders) priorities high to low 7x10 +/-4% >> 3x4 +/-10% >> 1x1 +/-20% >> 3x2 +/-40% >> 2x3 +/-40%
3) file format unified (PNG >> JPG with 94% quality); animated images, bad content/format, JPEG compression outliers cleaned up
4) sampling 1280px longest side (1024x1024^ for 1x1 +/-20% aspect ratio), re-MOGRIFY to 94% for 98-100% JPEG quality
5) imageboard tags arranged and partially placed inside image EXIF-info to improve usability of images alone
6) some general image statistics got with IMAGE MAGICK
DEEP CONTENT ANALYSIS produce bounboxes
7) KERAS-CRAFT text detector used to estimate total size and number of text pieces
8) [YOLOv5 torso detector v11](https://www.kaggle.com/printcraft/anime-and-cg-characters-detection-using-yolov5) run
number of heads used for folder/archive distribution, detected torso components assembled if possible
Simple numerical rank among all images has been built over each of numerical criteria,
so both outlier processing and ranking deal only with relative ranks 1..maxN or simple functions using it.
**Identical to BC2015:**
- "attractiveness score function" turned to definition "textless and colorful" (see details below)
- outliers to delete were defined as (three ranks independently)
* least attractive as above : faint / B&W or rich of text
* most segmented & crowded : several not overlapped segments & lots of tiny heads detected
* purely presented : partially filled (by min boundbox) & least detailed (by min enthropy) & too bright / dark (by max skewness modulus)
**This release contains:**
- 705.467 sampled images
* clustered by aspect ratio and also number of heads (0,1,2,3+) detected
* ordered and grouped into 1000-th zip/folders by "attractiveness score function"
- rich image-related metadata
- detailed results for keras & yolo detection algorythms
- full tags list with Danbooru enrichment
- sample code (commandline, python, PL/SQL) for key algorithms - not "ready to use" but building blocks
NOTE: extended and evolving description & additional artefacts placen on [KAGGLE](https://www.kaggle.com/datasets/printcraft/booru-chars-2022)