Knjižnica Fakulteta ekektrotehnike i računarstva u Zagrebu catalog

Normal view MARC view ISBD view

Krešo, Ivan

Convolutional architecture for efficient semantic segmentation of large images : doctoral thesis / mentor Siniša šegvić - Zagreb : I. Krešo; Faculty of electrical engoneering and computing, 2021. - x, 87 str. : ilustr. u bojama ; 30 cm. + CD

Ova doktorska disertacija proučava primjenu konvolucijskih neuronskih mreža za problem semantičke segmentacije velikih slika. Fokus je na obradi slika pribavljenih iz vozila u pokretu. To predstavlja dodatan izazov jer se objekti pojavljuju na širokom rasponu veličina, a vrlo je važno da metoda radi dobro kako na velikim tako i na malim objektima u slici. Disertacija započinje uvodom u problem semantičke segmentacije, izazovima i njenom vezom s problemom detekcije objekata. U idućem poglavlju predstavljene su konvolucijske neuronske mreže i povezani koncepti iz strojnog učenja te usporedba DenseNet i ResNet arhitekture. Iduća tri poglavlja predstavljaju predložene metode i doprinose rada. Prvo je predstavljen konvolucijski model koji postiže reprezentaciju invarijantnu na mjerilo pomoću stereoskopske dubine i slikovne piramide. Time je umanjen problem učenja istog objekta na čitavom rasponu mjerila. Zatim je predstavljena efikasna asimetrična arhitektura za gustu predikciju na velikim slikama zasnovana na gusto povezanim konvolucijskim značajkama te jednostavnim modulima za piramidalno sažimanje te ljestvičasto naduzorkovanje. Arhitektura je vremenski i memorijski efikasna i pogodna za izvođenje i stvarnom vremenu. Na kraju je predstavljena metodologija za učenje semantičke segmentacije na višedomenskim podatcima u sklopu sudjelovanja na natjecanju Robust Vision Challenge 2018. Fokus disertacije je primjena dubokih konvolucijskih modela odnosno konvolucijskih neuronskih mreža za problem semantičke segmentacije. Glavni doprinosi disertacije su sljedeći:
1. modul za izlučivanje invarijantnih konvolucijskih reprezentacija utemeljen na gustom odabiru mjerila s obzirom na lokalnu stereoskopsku dubinu;
2. učinkovita asimetrična arhitektura za semantičku segmentaciju koja se sastoji od složenog gusto povezanog modula za raspoznavanje te jednostavnih modula za piramidalno sažimanje te ljestvičasto naduzorkovanje;
3. metodologija za učenje semantičke segmentacije na višedomenskim podatcima.
This thesis investigates applying convolutional neural networks for semantic segmentation of large images in an efficient manner. We focus on the type of images that are recorded with a camera mounted on a vehicle. For the task of semantic segmentation, this means that the objects appear on a wide range of scales. Hence, it is important that the method works well both for small objects further away and large objects near the camera. The thesis starts by providing an introduction to the problem of semantic segmentation and explaining its relation to the problem of object localization. The introduction is concluded by discussing the challenges of the problem. In the next chapter, we review the convolutional neural networks and underlying machine learning concepts and make a comparison between DenseNet and ResNet architecture. The following three chapters describe the proposed methods and the contributions of the thesis. First, we develop a scale-invariant convolutional model for semantic segmentation which alleviates the problem of learning the same object on a wide range of scales. Second, we develop an efficient asymmetric architecture for dense prediction on large images based on a densely connected feature extractor and lightweight ladder-style upsampling. The proposed architecture is computationally and memory efficient and suitable for realtime applications. Third, we present the results from the Robust Vision Challenge 2018 where we participated in the problem of generalizing to multi-domain datasets. The main contributions of the thesis are as follows:
1. efficient asymmetric architecture for semantic segmentation consisting of a custom densely-connected recognition module, lightweight spatial pyramid pooling and lightweight upsampling;
2. module for extracting scale-invariant convolutional representations based on dense scale selection with respect to local stereoscopic depth;
3. methodology for learning semantic segmentation models on multi-domain datasets.