The paper analyzes data sets containing images with labeled traffic signs, as well as modern approaches for their detection and classification on images of urban scenes. Particular attention is paid to the recognition of Russian types of traffic signs. Various modern architectures of deep neural networks for the simultaneous object detection and classification were studied, including Faster R-CNN, Mask R-CNN, Cascade R-CNN, RetinaNet. To increase the efficiency of neural network recognition of objects in a video sequence, the Seq-BBox Matching algorithm is used. Training and testing of the proposed approach was carried out on Russian Traffic Sign Dataset and IceVision Dataset containing over 150 types of road signs and more than 65,000 marked images. For all the approaches considered, quality metrics are defined: mean average precision mAP, mean average recall mAR and processing time of one frame. The highest quality performance was demonstrated by the architecture of Faster R-CNN with Seq-BBox Matching, while the highest performance is provided by the architecture of RetinaNet. Implementation was carried out using the Python 3.7 programming language and PyTorch deep learning library using NVidia CUDA technology. Performance indicators were obtained on the workstation with the NVidia Tesla V-100 32GB video card. The obtained results demonstrate the possibility of applying the proposed approach both for the resource-intensive procedure for automated labeling of road scene images for new data sets preparation, and for traffic sign recognition in on-board computer vision systems of unmanned vehicles.