*aggregate*. While

*aggregate*is not as powerful as

*melt*and

*cast*from the

*reshape*package, it may result sometimes simpler and more intuitive to use, so it is worth to learn how it works.

The steps to obtain, prepare and read into R the data we need to work with *aggregate* are exactly the same as in the article mentioned previously, however I will include them below for your convenience.

### 1. Download the sample data file

Use the link at the bottom of this article and save the file somewhere on your hard drive. So far so good!

### 2. Open the file in Excel and save a CSV copy of the first sheet

Double-click the file from the location where you saved it to open it inside Excel. The Excel worksheet has 3 tabs. We will be using the first one, containing the source data, and the last one, which contains a ready-made Pivot Table to play with. If you do not know how to make Pivot Tables in Excel, I recommend you read the Microsoft tutorial. In this article we will focus on how to obtain the same results with R and we will use the ready-made pivot table on sheet 3.

Once you are on the first sheet of the sample data set (the one with the source data), save a copy of it as CSV (Comma Separated Values) file. This will generate a file which is easier to import into R. R has packages which allow you to read directly Excel files without converting them to CSV, but for the sake of simplicity we will stick to CSV.

Before saving as CSV, change the format of column E (Order Amount) to General so that the orders amounts appear as numbers with 2 decimals and not with a $ sign in front and comma separated thousands. If you export to CSV without doing this first, there will be extra steps required in R to convert the sales amounts to numeric (see the note at the end of the article).

### 3. Load the CSV file in R

This is an easy step accomplished with the *read.csv* function as follows.

1 |
data <- read.csv("file_name.txt", header=TRUE) |

file_name.txt is the name of the CSV file you created from Excel. If it is not in the current working directory in R, you can either change R’s working directory with *setwd* or move the file to the current working directory or add a path to the file before the file name.

At this point you should have the same base data open in Excel and loaded into R. In R the first lines will look like this:

1 2 3 4 5 6 7 8 |
> head(data) Country Salesperson Order.Date OrderID Order.Amount 1 UK Suyama 37812 10249 1863.40 2 USA Peacock 37813 10252 3597.90 3 USA Peacock 37814 10250 1552.60 4 UK Dodsworth 37817 10255 2490.50 5 USA Leverling 37817 10251 654.06 6 UK Buchanan 37818 10248 440.00 |

Now we are ready to get to work with *aggregate*.

### 4. How aggregate works

As mentioned above, *aggregate* is a much simpler function than * melt* and *cast.* While it can have multiple syntaxes, the one we will work with, which applies well to data frames, is the following:

1 |
aggregate(formula, data, FUN, ..., subset, na.action = na.omit) |

The most relevant argument to understand is *formula*. A formula as applicable to *aggregate* is something like this:

1 |
what_to_aggregate ~ aggregate_by |

In the case of our data and following the example in the previous post, we want to aggregate the *Order.Amount* by *Salesperson*, therefore the formula we need is simply:

1 |
Order.Amount ~ Salesperson |

Beside the formula, we need to specify the *data* argument, which will be equal to our data frame *data* (I know, sorry for haven chosen a data frame name that is the same of an argument name! I hope you can bear with me anyway…), and the *FUN* argument, which is the function we want to use for the aggregation. In the case of our example, we want to aggregate by summing up all sales order for each sales person, therefore *FUN=sum* is what we need.

The extra arguments * subset* allows us to subset the data before aggregating them. Say, for example, you want to make a pivot only for the sales persons in USA, you can use *subset=Country==”USA”*. Remember that the assignment operator = is different than the comparison operator ==. If it is confusing to you how to read the *subset* argument, you can also write it like this: *subset=(Country==”USA”)*.

Finally the argument *na.action* allows us to specify what to do with missing (N/A) values. The default is to omit them from the aggregation.

Ok, enough with the explanations, now we are ready to make our first pivot table with *aggregate*. Here is the command:

1 |
data.a <- aggregate( Order.Amount ~ Salesperson, data=data, FUN=sum) |

And this is the result (which, as expected, is identical to that from *melt* and *cast*):

1 2 3 4 5 6 7 8 9 10 11 |
> data.a Salesperson Order.Amount 1 Buchanan 68792.25 2 Callahan 123032.67 3 Davolio 182500.09 4 Dodsworth 75048.04 5 Fuller 162503.78 6 King 116962.99 7 Leverling 201196.27 8 Peacock 225763.68 9 Suyama 72527.63 |

Congratulations on your first pivot table with *aggregate*! 🙂

### 5. Adding Grand Totals

Sorry to delude you, but this is where *aggregate* falls short of* melt* and *cast*. In fact there is no option to add Grand Totals directly through *aggregate*. Yes, there are workarounds available which I may cover them in a future post, but no direct methods.

### 6. How to filter

We briefly touched on filtering while describing the *subset* argument above. Just add a subsetting criteria to the call to *aggregate* to pre-filter the data on which to aggregate. For example, wanting to aggregate only the sales order for the sales persons based in USA the new syntax is:

1 |
data.a <- aggregate( Order.Amount ~ Salesperson, data=data, FUN=sum, subset=(Country=="USA")) |

And this is the output:

1 2 3 4 5 6 7 |
> data.a Salesperson Order.Amount 1 Callahan 123032.7 2 Davolio 182500.1 3 Fuller 162503.8 4 Leverling 201196.3 5 Peacock 225763.7 |

Congratulations! You have just learned another easy way to make Pivot Tables in R using *aggregate*. 🙂

You can stop here or, if you feel brave enough and want to move to the next level, read the next optional paragraph.

### 7. Aggregating more columns and with more criteria at once

You may wonder about the syntax of the *formula* inside *aggregate* in case you want (or need) to aggregate more columns at once (the left side of the formula) according to multiple criteria (the right side of the formula). Well, here is the syntax you can use for 3 columns and 3 criteria (by):

1 |
cbind( column_1, column_2, column_3 ) ~ criteria_1 + criteria_2 + criteria_3 |

The aggregation function specified by *FUN* will be applied to each column specified on the left side of the formula after the data have been partitioned (grouped) according to the criteria specified on the right side of the formula.

To make a simpler example that fits with our data (which only have one numerical column), let’s say we want to aggregate on the Order Amount primarily by Order Data and secondarily by Country. We can easily issue to R a command like this:

1 |
data.a <- aggregate( Order.Amount ~ Order.Date + Country, data=data, FUN=sum) |

This will produce a Pivot Table where the Order Amount for each Order Date and Country combination has been aggregated using the *sum* function.

I hope you have enjoyed this article. Till next time!

## 2 Comments

The data file link in your melt and cast article for the sales data points now to a Microsoft Office theme download page.

The data file cannot be found there.

Thanks….Tom

Hi Tom,

thanks for informing me!

Luckily enough I could find on my hard-drive a copy of the original file. I posted it at the bottom of the first post of the series (Pivot Tables in R with melt and cast). Feel free to grab it from there.

Best Regards,

Marco.