How to train YOLOv3 to detect Peppa Pig

Prepare

Collecting Data from the Web with Python If you like

PASCAL VOC

.
├── VOCdevkit
│ └── VOC2018
│ ├── Annotations
├── ImageSets
│ │ └── Main // name of *.jpg without “.jpg”
├── JPEGImages // *.jpg

Annotation

LabelImg is a graphical image annotation tool

Example : XML

  1. <annotation>
  2. 	<folder>000</folder>
  3. 	<filename>000000.jpg</filename>
  4. 	<path>/home/water/Machine_Learning/Peppa_Pig/train/000/000000.jpg</path>
  5. 	<source>
  6. 		<database>Unknown</database>
  7. 	</source>
  8. 	<size>
  9. 		<width>410</width>
  10. 		<height>256</height>
  11. 		<depth>3</depth>
  12. 	</size>
  13. 	<segmented>0</segmented>
  14. 	<object>
  15. 		<name>Peppa</name>
  16. 		<pose>Unspecified</pose>
  17. 		<truncated>0</truncated>
  18. 		<difficult>0</difficult>
  19. 		<bndbox>
  20. 			<xmin>64</xmin>
  21. 			<ymin>87</ymin>
  22. 			<xmax>166</xmax>
  23. 			<ymax>226</ymax>
  24. 		</bndbox>
  25. 	</object>
  26. 	<object>
  27. 		<name>Peppa</name>
  28. 		<pose>Unspecified</pose>
  29. 		<truncated>0</truncated>
  30. 		<difficult>0</difficult>
  31. 		<bndbox>
  32. 			<xmin>290</xmin>
  33. 			<ymin>77</ymin>
  34. 			<xmax>333</xmax>
  35. 			<ymax>131</ymax>
  36. 		</bndbox>
  37. 	</object>
  38. </annotation>

Generate Labels for VOC

Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:

  1. <object-class> <x> <y> <width> <height>

voc_label.py

  1. import xml.etree.ElementTree as ET
  2. import pickle
  3. import os
  4. from os import listdir, getcwd
  5. from os.path import join
  6.  
  7. sets=[('2018', 'train'), ('2018', 'val')]
  8.  
  9. classes = ["Peppa"]
  10.  
  11. def convert(size, box):
  12.     dw = 1./size[0]
  13.     dh = 1./size[1]
  14.     x = (box[0] + box[1])/2.0
  15.     y = (box[2] + box[3])/2.0
  16.     w = box[1] - box[0]
  17.     h = box[3] - box[2]
  18.     x = x*dw
  19.     w = w*dw
  20.     y = y*dh
  21.     h = h*dh
  22.     return (x,y,w,h)
  23.  
  24. def convert_annotation(year, image_id):
  25.     in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
  26.     out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
  27.     tree=ET.parse(in_file)
  28.     root = tree.getroot()
  29.     size = root.find('size')
  30.     w = int(size.find('width').text)
  31.     h = int(size.find('height').text)
  32.  
  33.     for obj in root.iter('object'):
  34.         difficult = obj.find('difficult').text
  35.         cls = obj.find('name').text
  36.         if cls not in classes or int(difficult) == 1:
  37.             continue
  38.         cls_id = classes.index(cls)
  39.         xmlbox = obj.find('bndbox')
  40.         b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
  41.         bb = convert((w,h), b)
  42.         out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
  43.  
  44. wd = getcwd()
  45.  
  46. for year, image_set in sets:
  47.     if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
  48.         os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
  49.     image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
  50.     list_file = open('%s_%s.txt'%(year, image_set), 'w')
  51.     for image_id in image_ids:
  52.         list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
  53.         convert_annotation(year, image_id)
  54.     list_file.close()

peppa.data

  1. classes= 1
  2. train  = /home/d/Downloads/Peppa_VOC/2018_train.txt
  3. valid  = /home/d/Downloads/Peppa_VOC/2018_val.txt
  4. names = data/peppa.names
  5. backup = backup

yolov3-voc-peppa.cfg

  1. [net]
  2. # Testing
  3. batch=1
  4. subdivisions=1
  5. # Training
  6.  batch=64
  7.  subdivisions=2
  8. width=416
  9. height=416
  10. channels=3
  11. momentum=0.9
  12. decay=0.0005
  13. angle=0
  14. saturation = 1.5
  15. exposure = 1.5
  16. hue=.1
  17.  
  18. learning_rate=0.001
  19. burn_in=1000
  20. max_batches = 500200
  21. policy=steps
  22. steps=400000,450000
  23. scales=.1,.1
  24.  
  25. [convolutional]
  26. batch_normalize=1
  27. filters=16
  28. size=3
  29. stride=1
  30. pad=1
  31. activation=leaky
  32.  
  33. [maxpool]
  34. size=2
  35. stride=2
  36.  
  37. [convolutional]
  38. batch_normalize=1
  39. filters=32
  40. size=3
  41. stride=1
  42. pad=1
  43. activation=leaky
  44.  
  45. [maxpool]
  46. size=2
  47. stride=2
  48.  
  49. [convolutional]
  50. batch_normalize=1
  51. filters=64
  52. size=3
  53. stride=1
  54. pad=1
  55. activation=leaky
  56.  
  57. [maxpool]
  58. size=2
  59. stride=2
  60.  
  61. [convolutional]
  62. batch_normalize=1
  63. filters=128
  64. size=3
  65. stride=1
  66. pad=1
  67. activation=leaky
  68.  
  69. [maxpool]
  70. size=2
  71. stride=2
  72.  
  73. [convolutional]
  74. batch_normalize=1
  75. filters=256
  76. size=3
  77. stride=1
  78. pad=1
  79. activation=leaky
  80.  
  81. [maxpool]
  82. size=2
  83. stride=2
  84.  
  85. [convolutional]
  86. batch_normalize=1
  87. filters=512
  88. size=3
  89. stride=1
  90. pad=1
  91. activation=leaky
  92.  
  93. [maxpool]
  94. size=2
  95. stride=1
  96.  
  97. [convolutional]
  98. batch_normalize=1
  99. filters=1024
  100. size=3
  101. stride=1
  102. pad=1
  103. activation=leaky
  104.  
  105. ###########
  106.  
  107. [convolutional]
  108. batch_normalize=1
  109. filters=256
  110. size=1
  111. stride=1
  112. pad=1
  113. activation=leaky
  114.  
  115. [convolutional]
  116. batch_normalize=1
  117. filters=512
  118. size=3
  119. stride=1
  120. pad=1
  121. activation=leaky
  122.  
  123. [convolutional]
  124. size=1
  125. stride=1
  126. pad=1
  127. filters=18
  128. activation=linear
  129.  
  130.  
  131.  
  132. [yolo]
  133. mask = 3,4,5
  134. anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
  135. classes=1
  136. num=6
  137. jitter=.3
  138. ignore_thresh = .7
  139. truth_thresh = 1
  140. random=1
  141.  
  142. [route]
  143. layers = -4
  144.  
  145. [convolutional]
  146. batch_normalize=1
  147. filters=128
  148. size=1
  149. stride=1
  150. pad=1
  151. activation=leaky
  152.  
  153. [upsample]
  154. stride=2
  155.  
  156. [route]
  157. layers = -1, 8
  158.  
  159. [convolutional]
  160. batch_normalize=1
  161. filters=256
  162. size=3
  163. stride=1
  164. pad=1
  165. activation=leaky
  166.  
  167. [convolutional]
  168. size=1
  169. stride=1
  170. pad=1
  171. filters=18
  172. activation=linear
  173.  
  174. [yolo]
  175. mask = 0,1,2
  176. anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
  177. classes=1
  178. num=6
  179. jitter=.3
  180. ignore_thresh = .7
  181. truth_thresh = 1
  182. random=1

Train

  1. ./darknet detector train cfg/peppa.data cfg/yolov3-voc-peppa.cfg darknet53.conv.74

Test











Reference

  1. @article{yolov3,
  2.   title={YOLOv3: An Incremental Improvement},
  3.   author={Redmon, Joseph and Farhadi, Ali},
  4.   journal = {arXiv},
  5.   year={2018}
  6. }