Identifying outliers in R with ggplot2

One of the first steps when working with a fresh data set is to plot its values to identify patterns and outliers. When outliers appear, it is often useful to know which data point corresponds to them to check whether they are generated by data entry errors, data anomalies or other causes.

Unfortunately ggplot2 does not have an interactive mode to identify a point on a chart and one has to look for other solutions like GGobi (package rggobi) or iPlots.

However, if all is needed is to give a “name” to the outliers, it is possible to use ggplot labeling capabilities for the purpose. While labeling all points would usually produce a crowded and difficult to read plot, we can limit the labeling only to those points that respect certain conditions, namely our outliers.

Continue Reading