Road Safety data of UK

The road safety data of UK was analysed with emphasis on casualties by answering questions like casualties across age bands, casualties under different light and road conditions, Casualty trend across days of weeks for different vehicle types.

An integral part of performing data analysis is to visualise the data in order to infer results and communicate the findings effectively. Though most of the time is spent on cleaning and structuring the data, the visualizations should also be considered equally important. Visualisations should be in such a way as to show as much information possible within a plot or figure. This is integral to communicating the findings and sharing insights on the data. The visualisations in this project have been designed with the same intentions, (i.e.) in order to communicate our findings effectively as much as possible.

Different libraries in R used for data processing and plotting are ‘tidyverse’, ‘reshape2’, ‘dplyr’ and `data. table`. The functions of these libraries have been used to clean the data and plot them effectively in order to visualise the data usefully. Casualty data for the years 2016 to 2020 are fetched from the stats19 package. Accidents data is also fetched from the same package and loaded for the years 2018, 2019 and 2020. The outputs are stored in different data frames and merged using join functions as needed. The transpose function is used to view the plot with a different point of view in order to gain more insights. The columns that need to be plotted are factorised using the factor function in order to sort the axes labels and thereby make the plots readable. Mutate has also been used to update and modify data frames based on various conditions to facilitate organising and structuring of data. The string functions are also used to match patterns and group data with similarities.

Once the data is cleaned and structured, the ‘ggplot’ function is used to represent the data by using various combinations of fields. The fields to be plotted against each other are passed as arguments to aesthetic property of the function. Fill parameter of the aesthetic function is used to represent the data of two axes in terms of the third variable that is passed to it. The ‘ggplot’ library provides multiple functions that can be used to build upon the base plot. Many such functions have been used to enhance the base plots, thereby rendering the plots to look visually pleasing and appealing. The effect of melting data frame by pivoting using discrete values can be understood while plotting and using the column names appropriately to visualise the data meaningfully. The functions such as themes, scales and guides are all used to enhance the features of the plot, thereby increasing its readability while also making it look appealing to the audience. Though the popular opinion is that pie charts are not much accurate in terms of conveying information to the readers effectively, owing to the human’s tendency of not being able to apprehend proportional informational effectively, the pie chart has been used here to visualise data along with the values of percentages printed in order to emphasise on the majority of the field’s value, as can be seen in the data visualization of impact of light conditions in causing casualties.