3 Missing data

It is very often that some countries do not reporter the weight of the commodity traded, which might generate the loss of information, and also the comparison between regions will not be able to be performed. In this section, we describe all the processes involved in the estimation of the missing, imputation methods, mathematical correction, and the validation tool.

There are some codes that should be removed from the the imputation module. These codes were listed by the FIAS team and it is stored in the datatable fishtrade_hs_not_impute under Fisheries Commodities domain.

Data imputation summary

Data source:
- Domain: Fisheries Commodities
- Input data: fishtrade_built_data
- Output data:
  - fishtrade_data_imputed: The data imputed is stored in this table to be validated by the expert. It is known as a working table.
  - fishtrade_data_imputed_validated: This is almost a copy from the fishtrade_data_imputed datatable, but in this datatable the user is not allowed to change anything. All changes should be done over the datatable fishtrade_data_imputed, and then can be export to the final data table fishtrade_data_imputed_validated.

The imputation is performed at the partner level, i.e, the shipments are aggregated before the imputation.

There are 16 methods to impute the missing weights.
- 8 methods perform the imputation at Tariff Line level.
- 8 methods perform the imputation at 6 digits (HS) level.

Some automatic imputations can not be achieved. The main reason for this is there is no enough information to execute any method robustly. It is necessary the user’s intervention (manual imputation)

Imputation Flow

Figure 3.1 shows the imputation module flow. The first step is to build the helper tables, which use as input the data tables stored in the trade-input-data domain. Once the helper tables have been built, the next step is to get the data to be imputed from the data table fishtrade_built_data. The Imputation Module output is stored at the data table fishtrade_data_imputed. This data table works as a working table because the user can check the imputation by performing the manual imputation if needed. After the expert validation the data is exported to the production data table fishtrade_data_imputed_validated.

The order to apply the methods can be found at Section 3.4, and also links to the source of the functions.

Figure 3.1: Imputation Module Flow.

3.1 Mathematical conversion

The first step in the Imputation Module is the mathematical conversion. This step is not considered as an imputation method once we are just converting the units reported by the country.

In Figure 3.2, the left side shows the conversions applied in the quantity variable and also the weight. The right side shows all type of unit reported by the countries. When some countries do not report the weight value, but they report the quantity and also the quantity unit, hence it is possible to retrieve the weight, if the unit is reported in kilogram (code = 8). The other conversions aim to put all quantities in the same unit.

Function: unit_conversion

Figure 3.2: Imputation module: mathematical conversions

3.2 Partially Agreggation

The tariff line data from UNSD contains multiple shipments with identical combination of reporter/partner/commodity/flow/year/qunit (unit of quantity). Those are transactions registered separately, thus rows containinig non-missing values and quantities can be aggregated. Missing variables of the same type are also aggregated if they are all missing, as they will produce a missing aggregated value for missing disaggregated values while correctly summing the remaining variables. In Figure 3.3, we show an example of this. The green rectangle shows the values reported that can be aggregated, and in the blue rectangle shows the missing values that will also aggregated. This figure show how the is reported, and after the partial aggregation.

Note that the combination of year = 2016 | reporter = 76| partner = 764 | flow = 1 | hs = 030119000000 | qunit = 5 after the partially aggregation result in two rows. One composed by the values aggregated, and another with the missing values aggregated.

Figure 3.3: Imputation module: partially aggregation

3.3 Methods

In this section, we describe the methods used to impute the missing weight values. In Sections 3.3.1 and 3.3.3, we introduce the Helper Tables used to estimate the weight per unit, and the Unit Values for several levels of aggregation respectively. The imputation methods that use these Helper tables are described in the Section 3.3.5. The other methods are explained in next sections.

3.3.1 Helper Tables

In this Section, we show the shape of the table and variables which composed each auxiliary table. For each table we describe brefily how the the values were estimated. For a full detailed explanation can be found in the Sections 3.3.2 and 3.3.3.

Figure 3.4: Helper Table: weight estimated per quantity unit for the levels of aggregation: reporter-year; reporter; region-year.

Figure 3.5: Helper Table: weight estimated per quantity unit for the levels of aggregation: region; HS at the six digits level; year.

Figure 3.6: Helper Table: unit value estimated for several levels of aggregation: reporter-year; region-year; global-year.

3.3.2 Estimate weight per unit

3.3.2.1 General Approach

Figure 3.7: Helper Tables: general approach that is used in the building of the helper tables.

3.3.2.2 Reporter - Year

Figure 3.8: Helper Tables: weight per unit estimated by the median of partners by year

3.3.2.3 Reporter

Figure 3.9: Helper Tables: weight per unit estimated by the median of partners across the years.

3.3.2.4 Region-Year

Figure 3.10: Helper Tables: weight per unit estimated by the median of region by year.

3.3.2.5 Global-Year

Figure 3.11: Helper Tables: weight per unit estimated by the median of commodity by year.

3.3.2.6 Global

Figure 3.12: Helper Tables: weight per unit estimated by the median of commodity.

3.3.3 Estimate Unit Value

3.3.3.1 General Approach

Figure 3.13: Helper Tables: general approach that is used in the building of the helper tables.

3.3.3.2 Reporter - Year

Figure 3.14: Helper Tables: unit value estimated at the tariff line level by the median of partners by year.

3.3.3.3 Region - Year

Figure 3.15: Helper Tables: unit value estimated by the median of region by year using 6 digits (HS).

3.3.3.4 Global - Year

Figure 3.16: Helper Tables: unit value estimated by the median of commodity using 6 digits (HS)

3.3.4 Imputation by partially reported weight

Helper Tables: Impute the missing weight with the ratio between the given value and unit value calculated using only the valid monetary values. (TL level).

Figure 3.17: Helper Tables: Impute the missing weight with the ratio between the given value and unit value calculated using only the valid monetary values. (TL level).

3.3.5 Imputation using the helper tables

Figure 3.18: Helper Tables: imputation methods using the auxiliary tables.

3.3.6 Interpolate Unit Value

Figure 3.19: Helper Tables: Interpolate the UV using the previous, or next (year) available value, by reporter, partner, flow, and TL.

3.3.7 Unit Value Total

Helper Tables: The Unit Value Total is the ratio between the summing of the weights and the summing of the quantities, by year, reporter, flow, and TL.

Figure 3.20: Helper Tables: The Unit Value Total is the ratio between the summing of the weights and the summing of the quantities, by year, reporter, flow, and TL.

3.3.8 Interpolate Unit Value Total

Figure 3.21: Helper Tables: Interpolate using the Unit Value Total (TL level) of previous year.

3.3.9 Global Year Expanded

Figure 3.22: Helper Tables: Expand the Global-Year table by all the combination among: year, flow, and HS.

3.3.10 Global Year Expanded - HS6

Figure 3.23: Helper Tables: Compute the median of Unit Value found in the previous step by year, flow, and HS.

3.4 Functions

The imputation methods are applied as the order below:

Imputation by partially reported weight
- Level: Tariff Line
- Function: partially_aggregation
Interpolate Unit Value
- Level: Tariff Line
- Function: uv_hs_interpolate
Reporter - Year
- Level: Tariff Line
- Function: wq_uv_imputation
Unit Value: Reporter - Year
- Level: Tariff Line
- Function: wq_uv_imputation
Reporter - Weight
- Level: Tariff Line
- Function: wq_uv_imputation
Unit Value Total
- Level: Tariff Line
- Function: wq_uv_imputation
Unit Value Total
- Level: Tariff Line
- Function: wq_uv_imputation
Unit Value: Region - Year
- Level: HS at the six digits level
- Function: wq_uv_imputation
Unit Value: Global - Year
- Level: HS at the six digits level
- Function: wq_uv_imputation
Region
- Level: HS at the six digits level
- Function: wq_uv_imputation
Global - year
- Level: HS at the six digits level
- Function: wq_uv_imputation
Global
- Level: HS at the six digits level
- Function: wq_uv_imputation
Interpolate Unit Value Total
- Level: HS at the six digits level
- Function: run_imputation_module
Global Year Expanded
- Level: HS at the six digits level
- Function: run_imputation_module
Global Year Expanded - HS6
- Level: HS at the six digits level
- Function: run_imputation_module