Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

由點、線至面:從影像分析角度探討漫畫的組成與風格-朱威達

由點、線至面:從影像分析角度探討漫畫的組成與風格-朱威達

  • Inicia sesión para ver los comentarios

由點、線至面:從影像分析角度探討漫畫的組成與風格-朱威達

  1. 1. 由點、線至面:從影像分析角度 探討漫畫的組成與風格 朱威達 中正大學資訊工程學系 wtchu@ccu.edu.tw 1
  2. 2. Fair Use Declaration • This statement is submitted for elaborating the legitimate status for illustrating all the “Screen Printings” and “Comics” in the “由點、線至面:從影像分析角度探討漫畫的組成與風格” are cited under the doctrine of “Fair Use” for research purpose if copyright protection applies on them. • The legal doctrine establishes globally that originality is needed to be seen for a work pursuing copyright protection, namely, the originality is the very essence of creation in intellectual domain. On account of that, an automatically recorded screen motion of the interactive computer games can be deemed as no copyright protection on it, therefore it can be lawfully applied in the “由點、線至面:從影像分析角度探討漫畫的組成與風格” as part of the research materials without the written permission from the copyright owner of the computer games. However, some people might treat them as copyright protected materials still for the drawings or similar creations in the background of the animations or comics, if that applies, according to the international intellectual property agreements and copyright law in respective jurisdictions, such as Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS) article 13, Berne Convention for the Protection of Literary and Artistic Works article 9(2), EU Copyright Directive article 5(5), Copyright Law of the United States of America section 107 and Taiwan Copyright Act article 65, the Fair Use and Fair Dealing of a copyrighted work based on teaching, scholarship, or research shall applied under the circumstances to sustain the citation for all the “Screen Printings“ and “Comics” in the “由點、線至面:從影像分析角度探討漫畫的組成與風格” as legitimate action abided by at law, which do not conflict with a normal exploitation of the works and do not unreasonably prejudice the legitimate interests of the right holders. 2
  3. 3. Introduction • Comics-based presentation for movie, animation, and photos, emerges recently. • Comics are believed to be an ideal medium for visual storytelling because of rich expressivity, high interactivity, and high portability. 3 Sample generated comic pages from the animation “Neon Genesis Evangelion” (top) and from the animation “Summer Wars” (bottom).
  4. 4. Introduction Three key constituents of manga [1]. 1. Drawing/絵絵絵絵 2. Language/言葉言葉言葉言葉 3. Panel/コマコマコマコマ 4[1] 夏目 房之介 (1997). マンガはなぜ面白いのか―その表現と文法. NHKライブラリー. 點 線 面 Drawing Panel
  5. 5. Outline • Part 1: Manga Style Analysis • Part 2: Comics-based Storytelling 5
  6. 6. Motivation • As the internet and mobile devices become popular, digital mangas are widely accessible. • Different mangas may have different styles. We focus on which features can be used to distinguish different manga styles. 6
  7. 7. From bounding box of each panel, we extract features to describe characteristics of layout. 1) : average panel height (derived from bounding boxes) 2) : average panel width 3) : standard deviation of 4) : standard deviation of 7 Panel Feature Extraction
  8. 8. 5) : the ratio of total panel area to the whole page 6) : average panel area 7) : standard deviation of 8) : average slope of vertical panel boundaries 9) : average slope of horizontal panel boundaries 10) : standard deviation of 11) : standard deviation of 8 Panel Feature Extraction
  9. 9. 9 Panel Feature Extraction Top row: sample manga pages from three different artists. Bottom row: panel feature distributions corresponding to these pages.
  10. 10. Screentone Detection 11 • Screentone is a technique for applying textures and shades to drawings, used as an alternative to hatching. • Different authors have different habits to use screentone.
  11. 11. Screentone Detection 11 1. Image binarization. 2. Dilation. 3. Delete small areas. 4. Get screentone areas. 5. Extract patches from screentone areas.
  12. 12. Screentone Feature Extraction 12 • Two screentone features are proposed: – The ratio of screentone areas to the whole panel area ( ). – Bag of screentone ( ). • Gabor wavelet texture • Use affinity propagation to cluster features, and use the bag of word model to describe screentone. B.J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 2007.
  13. 13. Screentone Feature Extraction 13 Top row: sample manga pages from three different artists. Bottom row: the BoP distributions corresponding to these artists.
  14. 14. Character Detection 14 • Apply the eye detection model in a sliding window manner to detect eyes. • Expand the areas from eye regions. The big regions extended by all detected eye regions are then covered by a minimum bounding box, which is finally the determined character’s head region.
  15. 15. Line Feature Extraction 15 • Canny edge detection • Edge linking P. Kovesi, School of Computer Science & Software Engineering, The University of Western Australia, http://www.csse.uwa.edu.au, 2001. (1) Face image (2) Canny edge image (3) Edge linking (4) Straight line segmentation.
  16. 16. Line Feature Extraction 16 • Included angle between lines ( ): For two spatially adjacent segment lines, we calculate the included angle between them. The feature can be represented as a 12-dimensional histogram. Shonen Shojo
  17. 17. Line Feature Extraction • Line orientation ( ): Orientation of a line segment is defined as the included angle between it and the horizontal axis. The feature can be represented as a 12-dimensional orientation histogram. 17 Mitsuru Adachi Terajima Yuji
  18. 18. Line Feature Extraction 18 • Density of line segments ( ): We calculate the number of lines in its neighborhood, and the information over all line segments are gathered to form the feature. It can be represented by 20-dimensional histogram. Mitsuru Adachi Terajima Yuji
  19. 19. Line Feature Extraction 19 • Orientation of nearby lines ( ): Orientations of a line segment’s nearby lines are calculated, in the representation of a 12-dimensional orientation histogram Mitsuru Adachi Terajima Yuji
  20. 20. Line Feature Extraction • Number of nearby lines with similar orientation ( ): To a line segment L, we calculate the number of its nearby lines that have similar orientation to L. Such information over all line segments is gathered to form a 20-dimensional histogram. 20 Shonen Shojo
  21. 21. Line Feature Extraction 21 • Line strength varied ( ): We use twenty different threshold settings for Canny edge detection. The ratio of detection results to standard result is the feature. It is a 20- dimensional vector . Shonen Shojo
  22. 22. Feature Analysis 23 • Comparison between mangas of different types of magazines. Shone manga: 3 different mangas, totally 300 pages. Shojo manga: 3 different mangas, totally 300 pages. (4) “ I love flowers and Mr.”, Kumaoka Fuyuyu. (5) “ The first love honey”, Minase Ai. (6) “ From me to you”, Shiina Karuho. (1) “Nisekoi”, Komi Naoshi. (2) “ Yamada-kun and the seven witches”, Miki Yoshikawa. (3) “ Agatsuma's my daughter”, Nishikida Keikokorozashi.
  23. 23. Feature Analysis 23 – Comparison between mangas with the same topic but drawn by different artists. – Use statistical comparison to analyze the proposed features. Baseball manga: 3 different mangas, totally 300 pages. (1) “Ace of Diamond”, Terajima Yuji. (2) “Mix”, Mitsuru Adachi. (3) “Big Windup”, Mizushima Tsutomu.
  24. 24. Feature Analysis 24 – Comparison between mangas of different types of magazines. P-value 0.039 0.414 0.151 0.429 0.017 0.003 0.044 0.000 Shonen Shojo
  25. 25. Feature Analysis 25 • Distance map (Shonen mangas v.s Shojo mangas): ( : 0.017) ( : 0.003) ( : 0.414) ( : 0.429) shonen shonen shojo shojo
  26. 26. Feature Analysis 26 – Comparison between mangas with the same topic but drawn by different artists. – P-value: TY v.s MA 0.037 0.183 0.000 0.277 0.000 0.000 0.6 0.000 TY v.s MT 0.105 0.006 0.075 0.007 0.199 0.074 0.47 0.000 MA v.s MT 0.325 0.091 0.161 0.061 0.011 0.000 0.14 0.000 Terajima Yuji (TY) Mitsuru Adachi (MA) Mizuhima Tsutomu (MT)
  27. 27. Feature Analysis 27 • Spider chart (based on skewness of features):
  28. 28. Feature Analysis 34 • Comparison between mangas of different types of magzines. – SVM test: 5-fold cross-validation. – Comparison between mangas with the same topic but drawn by different artists. – SVM test: 5-fold cross-validation accuracy 71.6 61.5 60.5 56.8 70.3 74.5 82 75 79.3 80 TY v.s MA 74.2 64.2 70 62.1 77.8 90.7 90 63.3 71.6 72 TY v.s MT 65 72.8 62.8 72.8 50 69.2 76.1 56.6 88 88 MA v.s MT 71.4 67.1 64.2 68.5 74.2 86.4 86.1 66.6 82 81
  29. 29. Latent Style Model • Developing a style model based on Latent Dirichlet Allocation (LDA) to discover style elements. • Documents can be represented as mixtures of latent topics, where each topic is formed by a distribution over words. 29 …… 1 2 3 Document Topic Word ~ ~ , , Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
  30. 30. Latent Style Model 30 Attribute of Latent Dirichlet Allocation Text document Latent topics Word Attribute of Latent Style Model Manga pages of the same artist Latent style elements Visual word (manga page) Given a set of documents , … , ! with the observed visual words, we can efficiently learn the model by the Gibbs sampling algorithm. Style probabilities of a document can be estimated, which enable us to represent a document as a distribution of style elements.
  31. 31. 31 Style Element Distributions Top: sample manga pages from three different documents. Bottom: style element distributions corresponding to these documents.
  32. 32. Artists in Dataset 1 32 (F)“天地を喰らう”,本宮 ひろ志. (G)“北斗の拳”,原 哲夫. (H)“魁!!男塾”,宮下 あきら. (C) “うしおととら”,藤田 和日郎. (D)“金色のガッシュ!!”,雷句 誠. (E)“呪法解禁!!”,麻生 羽呂. (A)“Fairy Tail”,真島 ヒロ. (B) “ヤンキー君とメガネちゃん”,吉河 美希. (A) (B) (D)(C) (E) (F) (G) (H) 100 manga pages from eight different artists, consisting of totally 800 manga pages.
  33. 33. Art Movement in Dataset 1 33
  34. 34. 34 (B) “ヤンキー君とメガネちゃん” 吉河 美希. (E) “呪法解禁!!” 麻生 羽呂. (G)“北斗の拳” 原 哲夫. Artist Style Element Distributions Top: sample manga pages from three different artists. Bottom: style element distributions corresponding to these artists.
  35. 35. Style-Based Art Movement Retrieval 35 Given a query, we would like to retrieve manga documents produced by artists of the same movement. 0.65 0.7 0.75 0.8 0.85 0.9 10 styles 20 styles 30 styles 40 styles MAP@10 hist. intersection(line features) chi-square(line features) hist. intersection(all features) chi-square(all features) distance measure 10 styles 20 styles 30 styles 40 styles line features histogram intersection 0.7093 0.7152 0.7329 0.7158 line features chi square 0.7024 0.719 0.7443 0.7383 all features histogram intersection 0.8413 0.8472 0.8483 0.8125 all features chi square 0.8358 0.8518 0.8544 0.8196 MAP@10
  36. 36. Style-Based Artist Retrieval 36 Given an artist’s manga document, we would like to retrieve other documents produced by the same artist. 0.55 0.6 0.65 0.7 0.75 0.8 0.85 10 styles 20 styles 30 styles 40 styles MAP@10 hist. intersection(line features) chi-square(line features) hist. intersection(all features) chi-square(all features) distance measure 10 styles 20 styles 30 styles 40 styles line features histogram intersection 0.6401 0.6404 0.6460 0.6323 line features chi square 0.6385 0.6457 0.6541 0.6537 all features histogram intersection 0.7627 0.7663 0.7854 0.7654 all features chi square 0.7470 0.7553 0.7939 0.7824 MAP@10
  37. 37. Artwork Period Retrieval We take the manga JoJo's Bizarre Adventure for analysis, which is created by Hirohiko Araki from 1987 to now. Totally 300 pages. 37 ジョジョの奇妙な冒険 Part 3 (1989-1992) ジョジョの奇妙な冒険 Part 8 (2011-ongoing) ジョジョの奇妙な冒険 Part 1 (1987)
  38. 38. Sample results of the query and top returned documents. 38 Artwork Period Retrieval
  39. 39. 39 Given an artist’s manga document, we would like to retrieve other documents produced by the same period. 0.5 0.55 0.6 0.65 0.7 0.75 10 styles 20 styles 30 styles 40 styles MAP@10 hist. intersection(line features) chi-square(line features) hist. intersection(all features) chi-square(all features) distance measure 10 styles 20 styles 30 styles 40 styles line features histogram intersection 0.5703 0.6247 0.6377 0.6428 line features chi square 0.5779 0.6446 0.6581 0.6622 all features histogram intersection 0.6321 0.6521 0.6698 0.6751 all features chi square 0.6376 0.6641 0.6781 0.6899 MAP@10 Artwork Period Retrieval
  40. 40. Summary • Manga style analysis – Manga-specific features – Based on LDA, implicit style elements are discovered in a probabilistic framework. – Analysis can be achieved at the style level rather than the feature level. • Applications – Style-based browsing – Influence discovery – Relationship between style and other properties 40
  41. 41. Part 2: Comics-Based Storytelling 朱威達 中正大學資訊工程學系 wtchu@ccu.edu.tw 41
  42. 42. Comics-Based Storytelling • Goal: Develop a systematic framework to enable comics-based storytelling of temporal image sequences – Comic design theory – Formulate core components as optimization problems and systematically solve them – Interactivity 42
  43. 43. Challenges • Q1. How to segment the given temporal image sequence, so that images in the same subsequence present similar semantics/events/scenes and are appropriately to be put into the same comic page? 43
  44. 44. Challenges • Q2. What is the best layout to arrange panels in the same page? 44 ?? ?
  45. 45. Challenges • Q3. How to place speech balloons, so that important content in images are not occluded by balloons, and balloons’ positions direct viewer’s gaze to build a pleasing reading trajectory? 45
  46. 46. Optimized Page Allocation • Allocate appropriate number of comic pages that may include various numbers of cells. – Visual coherence: Consecutive or similar visual content tends to be put into the same comic page. – Browsing pace: Keyframes conveying high motion are tended to be put into the same pages containing more panels to build tense browsing experience. • A labeling problem, with the temporal continuity constraint – Solution: Genetic algorithm (GA) 46 1 1 1 2 2 2 2 3 3 4 4 4 Q1. How to segment the given temporal image sequence?
  47. 47. Optimized Page Allocation 47 1 1 1 2 2 2 2 3 3 4 4 4
  48. 48. Objective Function (Fitness) Page 1 Page 2 Page 3 Page 1 Page 2 Page 1 Page 2 At the 5th iteration At the 20th iteration At the 90th iteration 0.7 0.75 0.8 0.85 0.9 0.95 1 11 21 31 41 51 61 71 81 91 101 Best Average Worst 5th iteration 20th iteration 90th iteration 48 Iteration fitness
  49. 49. Optimized Layout Selection • Desired properties – More important images should be allocated larger panels – Keyframes extracted from the same shot or photos consecutively taken in the same place are better to be put in the same row of panels – Keyframes with more subtitle words or photos with more annotation are to be allocated larger panels. • Idea – Determine the images-layout pair that has the most similar “importance” distributions. 49 Q2. What is the best layout to arrange panels in the same page?
  50. 50. Image Importance • From each keyframe, the region of interest (ROI) is extracted based on color contrast [Cheng’11]. • Assume that the keyframes are determined to put at the same page. The importance value of a keyframe is defined as ratio of the area of ROI ratio of the number of subtitle words the minimum color histogram distance from this frame to other frames 50 M.-M. Cheng, G.-X. Zhang, N.J. Mitra, X. Huang, and S.-M. Hu. “Global contrast based salient region detection.” Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 409-416, 2011.
  51. 51. Layout Design 51 ... ... ... ......... 1 panel layout 2 panels layout 3 panels layout 4 panels layout 5 panel layout 6 panels layout 7 panels layout 8 panels layout
  52. 52. Layout Importance • Layout importance • To measure how appropriately a layout matches with the given image sequence – Inner product: 52 1/3 1/3 1/3 0.5 0.25 0.25 0.25 0.25 0.5 . . . . . . . . : the ratio of the area of the jth panel to the area of the whole page.
  53. 53. Layout Importance • Binary vectors to show how panels are arranged into rows – How different panel arrangements fit with shot: • Importance distribution in terms of numbers of spoken words – Inner product: 53 1st row 2nd row 3rd row 1st row 2nd row r1=(01100) r2=(00110) r3=(01000)
  54. 54. 1/3 1/3 1/3 0.5 0.25 0.25 0.25 0.25 0.5 . . . . . . . . Layout Selection 54 r1=(011) r2=(010) r1=(001) Shot # 1 2 2 Images q 0 1 0 The best layout is selected by:
  55. 55. Crop Paste+Resize Find ROI Find center Extend Composition 55
  56. 56. Layout Selection Comparison 56 Example 2: Layout selected by the proposed method (c) and two different equally-allocated layouts (d)(e). Example 1: Layout selected by the proposed method (a) and by equal allocation (b). (a) (b) (c) (d) (e)
  57. 57. Balloon Placement • Optimal positions are determined by jointly considering the following factors: – Balloons should not overlap with the regions of interest (ROIs) in images. – Balloons should be placed as close as the ROI in images. – When there are multiple balloons in a panel, the sentences spoken earlier should be placed closer to the left-top corner of the panel. This is to maintain correct reading order. – Balloons should not overlap with each other. – Reading trajectory should be built so that reading order is not only correct but also vivid. 57 Q3. How to place speech balloons?
  58. 58. Optimized Speech Balloon Placement • Finally, the five factors are linearly combined: • This problem can be intuitively mapped to the one efficiently solved by the particle swarm optimization algorithm (PSO). 58 local region global region
  59. 59. 59 Left: demonstration of PSO in 200 iterations Right: ROI of comic page Comparison of balloon placement considering different factors. (a)(c) The placement results if all factors are jointly considered. (b) The placement result if overlapping between balloons is not taken into account. (d) The placement result if overlapping between balloons and ROIs is not taken into account. Optimized Speech Balloon Placement
  60. 60. Demo 60
  61. 61. Summary 61 • We have presented a system that automatically transforms temporal image sequences into comics-based storytelling. – Optimized page allocation – Optimized layout selection – Optimized speech balloon placement • Future work – ROI analysis techniques specially designed for animation – Investigation of semantics on automatic comics generation
  62. 62. Questions? Wei-Ta Chu (朱威達) National Chung Cheng University wtchu@ccu.edu.tw 62

×