The paper considers usage of fine-tuning of the deep neural network ensemble for recognition of 60 event types in the set of 60,000 images from WIDER database. The applied ensemble consists of two deep convolutional neural networks (CNN) using the GoogLeNet architecture, previously trained on other image bases: ImageNet and Places. Separately the accuracy of recognition of 10 events was analyzed: “Car Racing”, “Ceremony”, “Concert”, “Demonstration”, “Football”, “Meeting”, “Picnic”, “Swimming”, “Tennis” and “Traffic”. During the ensemble training output layer in the each of deep CNN is replaced to the layer with respectively 10 and 60 neurons and we tune only weights which connect output layer with previous one. The classification accuracy of 10 event classes from the WIDER image database averages 83.22%, for 60 event classes accuracy is 50.4%. In addition, the approach based on the automatic features formation using deep CNN provided a much better recognition quality of social events compared to the choice of features manually (LBP, LDP or HOG) and their further classification by support vector machine. The testing time of the developed ensemble provides the possibility of using the classifier in practical applications of event recognition with a processing speed up to 20 frames per second.