When to use logs
In economics and especially applied econometrics we often observe the use of logarithms. But others’ choices to log-transform variables do not necessarily mean that you should do the same to yours. It very much depends on the context.
One reason to use logs is to transform a non-linear model into a linear one. For example, in the Cobb-Douglas production function, output (Y) is a non-linear function of capital (K) and labour (L): Y = Ka Lb, where a and b are parameters to be estimated. (Remember: we usually have data for Y, K and L). If we log both sides and add a constant, we get log(Y) = c + a log(K) + b log(L), which can then be estimated using the least squares method.
Specified in this way, the estimated parameters, a and b, are elasticities. For example, a is the percentage change in Y given a one percent increase in K. Because we are often interested in elasticities, interpretation might be another reason why you would want to work with logs.
A numerical example will help you understand this better. Suppose you are interested in the effect of advertising on sales and your sample consists of firms of different size. Small firms (with limited geographic scope) may find that a €1 increase in spending on advertising has a small effect on sales, whereas a larger firm may experience a proportionally larger increase for the same amount of advertising spending. The following table captures this effect:
Firm | Advertising | Sales |
A | 100 | 1000 |
A | 101 | 1100 |
B | 100 | 2000 |
B | 101 | 2200 |
C | 100 | 3000 |
C | 101 | 3300 |
If you run a regression of sales on advertising (including firm dummies or fixed effects to compare like with like), you get an estimated coefficient on advertising of 200. This makes sense for firm B but not for the other firms. You cannot assume that a €1 increase in advertising will generate a constant 200 unit increase in sales regardless of the size of the firm. If, instead, you regress the log of sales on the log of advertising you get an estimated coefficient of around 10, which can be interpreted as an elasticity. Thus, a 1% increase in advertising increases the sales by 10%. Look at the data. It makes sense! The effect of advertising is proportional to the size of the firm.
For such reasons, we often log-transform variables measured in euros or quantities, but not variables that are measured as ratios (e.g. interest rates, unemployment rate). Even though you can take logs for ratio variables, we generally do not since they already have a percentage interpretation.
The underlying economic reasons discussed above should be central to the choice of log versus linear specification. But logs are also useful for statistical reasons.
Some researchers log a variable when it has a positive skew. Many economic variables do (e.g. wage distribution, house prices). Log-transforming such variables would make their distribution appear more normal. For example, in the interactive figure below, you can see how log-transforming the 2010 Maltese wage distribution makes it more symmetric. But in itself this is not a very good reason for taking logs – none of the linear regression model assumptions require that your dependent or independent variables be normally distributed.
It may, however, be useful to log-transform your variables because it helps to normalise your regression residuals. While the linear regression model assumptions do not require that the residuals are normally distributed, this is advantageous for statistical inferences (especially when the sample is small).
Note that the transformation is helpful only if the variable in levels has an approximately log-normal distribution (which many economic variables do). Otherwise, for example in the case of a the data having a negative skew, taking logs might make the distribution less normal.
Another reason for using logs has to do with the transformation supressing variation. So if your regression residuals are increasing in the fitted/predicted value, log-transforming your dependent and/or independent variables may help reduce heteroskedasticity. But be careful! Taking logs can sometimes worsen the heteroskedasticity problem.
If the spread is approximately proportional to the conditional mean, taking the log of the dependent variable helps deal with heteroskedasticity. See LHS interactive figure below. But if the spread is not increasing with the conditional mean in an approximately proportionate manner (or more than proportionate manner), then log-transforming the dependent variable will often worsen the heteroskedasticity problem. See RHS interactive figure below. Put differently, the log-transformation is helpful when the data is in the shape of a fan/cone and skewed.
There are, therefore, a number of theoretical and statistical reasons why log-transforming your variables can be helpful. But it depends on the context and the data. Think about what makes sense for you.
Update: all of the above applies also to time series variables. See here and here for more on log-transforming time series data.