Industry and Occupation Granularity
The challenge of producing useful data all the way down to the most specific industry and occupation levels is intertwined with the whole issue of data suppressions. The reason for this is that most suppressions in the raw data occur at the 3 and 4-digit SIC and SOC levels. Why is this?
Basically, suppressions are applied to the raw data to prevent too much information about specific groups of workers being revealed. So for example, if there are only a small number of establishments in an industry for a particular geographic area, the data might be suppressed in order to prevent one firm finding out details about their competitors. Another example would be where one firm constitutes the overwhelming employment in a given industry, and so data might be suppressed in order to hide information about that company.
For companies that publish LMI, this presents something of a dilemma: If much of the data at the lower levels is suppressed, what should be done about it? There are essentially two approaches that can be taken.
The first is to go through the process of unsuppressing the data, using robust data modelling techniques to reveal the figures for industries and jobs at the most specific SIC and SOC levels (this is a perfectly legal practice, by the way). The other approach is to leave the suppressions in, and instead make assumptions about the 3 and 4-digit level industries and occupations (where the suppressions are) based on the data and trends for their parent 2-digit level industries and occupations. So for example, if the data for the 2-digit occupation Health Professionals was showing a 4% rise over the last year, this approach would assume that all occupations at the 3 and 4-digit level within the Health Professionals category also rose at 4%.
Each method has both advantages and disadvantages. The advantage of taking the first approach, where suppressions are removed using data modelling methodology, is that this will result in accurate data, right down to the most granular 3 and 4 digit levels. The disadvantage of this approach is that it is both time-consuming and costly, and so any company doing this will inevitably have to charge more for their LMI to reflect this investment. This approach is the one taken by EMSI.
As for the second approach, the advantage is that it is likely to be much less costly, since the process of unsuppressing data is not carried out. On the other hand, the big disadvantage is that the data can often give a quite misleading picture of the labour market at the most granular levels. For example, within the 2-digit occupation category Textiles, Printing and Other Skilled Trades, there are occupations as diverse as Butchers, Printers and Cooks. So if Textiles, Printing and Other Skilled Trades grew at 4% over the last year, this approach would assume that Butchers, Printers and Cooks all grew at this same rate, which is, needless to say, highly unlikely. This approach is taken by companies that rely on UK Commission for Employment and Skills (UKCES) “LMI for All” data, which does not delve down to the most granular levels of industry and occupation data.
A similar problem exists when it comes to geographic areas, whereby some LMI solutions assume that what is happening at a high level will also be happening at lower levels. Once again, the problem is caused by suppressions in the original datasets, and again the reasons are the same: raw data will often be suppressed at the more specific levels in order to prevent too much information being revealed.
The question might arise in the minds of some as to whether this is all that important. The answer to this is yes, it is vitally important. As we stated in Part 3, what is often called “the economy” is in reality the aggregate values of a series of smaller “functional” economies, each of which will be markedly different in nature. In fact, we can go further and state that regional economies are aggregates of economies at the county/ unitary authority level, and economies at the county/unitary authority level are aggregates of economies at the local authority level. Taking the data for a large geography, and assuming that all the smaller economies underneath this will follow the same pattern is — to say the least — a risky business.
Let’s give an example, using some data. According to our figures, the 4-digit Financial service activities, except insurance and pension funding sector grew in the South East region by 1,763 jobs or 4% between 2012 and 2015. However, delving beneath the regional level shows a wide variation. The following table shows the top five Counties/Unitary Authorities in the South East in terms of the number of jobs, between 2012 and 2015:
As you can see, the pattern of change is by no means uniform, and none of these top five counties grew at anywhere near close to the 4% average for the whole of the South East. Were we to dig even further to local authority level, we would undoubtedly uncover similar variations.
As with the industry/occupations dilemma, once again there are essentially two solutions: the first is to go through the painstaking process of modelling the data and so uncover the data for the most specific geographies; the other is to make assumptions for the lower level geographies based on the data in the higher levels. Once again the advantages and disadvantages are the same: uncovering the suppressions will lead to far more accurate and reliable data, but will come with a cost; leaving the suppressions and making assumptions will mean lower costs but can create potentially misleading data.