# ALCD tutorial This is a step-by-step tutorial, to help you use the ALCD algorithm. The expected result is the following:
flowcharts nomenclature

Figure 1: Classification of Arles, 20171002

Change to the ALCD directory. The program that you should use is all_run_alcd.py. You can display the help with ```bash python all_run_alcd.py −h ``` The available options are: - ``l``: the location. The spelling should be consistent with the names in the L1C directory (e.g. Pretoria or Orleans) - ``d``: the (cloudy) date that you want to classify (e.g. 20180319) - ``c``: the clear date that will help the classification (e.g. 20180321) - ``f``: if this is the first iteration or not. If set to True, it will compute and create all the features, and create the empty class layers. Set it to True for the first iteration, and thereafter to False. - ``s``: the step you want to do, the choice is between 0 and 1. 0 will create all the needed files if this is the first iteration, otherwise it will save the previous iteration. 1 will run the ALCD algorithm, i.e train a model and classify the image. For each iteration, you should set it to 0, modify the masks, and then set it to 1. - ``kfold``: boolean. If set to True, ALCD will perform a k-fold cross-validation with the available samples. - ``dates``: boolean. If set to True, ALCD will display the available dates for the given location. - ``global_parameters``: path to json file which parametrize ALCD (for more information, please see the `configuration documentation `.) - ``paths_parameters``: path to json file which contain useful paths for ALCD (for more information, please see the `configuration documentation `.) - ``model_parameters``: path to json file which contain classifier parameters (for more information, please see the `configuration documentation `.) ## Paths preparation Before running anything, you need to set the correct paths and parameters. In the ``paths_configuration.json``: - Add the tile code linked to the location you want to add - Create the output directory for ALCD, and set its path in the "data_alcd" variable - Set the correct paths for the L1C directory and the DTM_input. In the global_parameters.json, if you use a distant and a local machine, set the ``local_paths`` variables accordingly. ## Step 1 A good practice is to visualise the two dates we want to use beforehand. This can be facilitated by the code *quicklook_generator.py*, which generates quicklooks for a given location. The user can therefore make sure that the cloud-free image is indeed cloud-free, and that the image to be classified is interesting. Therefore, initialize the environment by running : ```bash python all_run_alcd.py -f True -s 0 -l city_name_dir -d cloudy_date -c clear_date -kfold False -global_parameters path_of_global_parameters.json -paths_parameters path_of_paths_configuration.json -model_parameters path_of_model_parameters.json ``` This will create the concatenated .tif with all the bands, and empty shapefiles for each class, among other things. It invites you to copy those created files to your local machine, to accelerate the process in QGIS (on our processing computer, visualisation is slow, so we use QGIS on a different computer). You can also modify the files directly, in this case, you can skip the manual copy of the files and go to Step 2. Otherwise, copy the files on your machine with QGIS, and go to Step 2. ## Step 2 You can now open QGIS. Open the raster ``In_data/Image/city_name_bands_H.tif`` (H stands for Heavy, as it is in full resolution of 20m per pixel), and ``In_data/Image/city_name_bands.tif``. The ``city_name_bands_H.tif`` bands refer to the band 2 (blue), 3 (green), 4 (red), 10 (the band at 1375nm), the NDVI and the NDWI. The bands for the ``city_name_bands.tif`` are quite numerous, but the content of each band is documented in the .txt file corresponding to each .tif. Now, adjust the style in QGIS such that you see the image in true colors. For that, you can load the file ``color_tables/heavy_tif_true_colors_style.qml`` on the Heavy .tif. You should get :
flowcharts nomenclature

Figure 2: QGIS window with the scene displayed in true colors

Now, load all the empty shapefiles from the directory ``In_data/Masks``. If you display a band being a time difference (for example the 20th band of ``city_name_bands. tif``), you will observe that there was no data on the bottom-right corner for the clear date. The same is true with the top-left corner for the cloudy date. Thus, the no_data file already has some data (which is the case if one or both of the original images have no_data pixels). As you can see, on ``Figure 2`` the top-left and bottom-right corners are covered by the no-data mask. If you are not satisfied with the mask, you can edit it manually. You should get something along the lines of the following :
flowcharts nomenclature

Figure 3: No-data areas are automatically computed

This no-data layer is used to discard the areas under it, be it for the classification, or if the user add samples in these areas by mistake. ## Step 3 It is now time to edit the masks layers. For each class (land, low clouds, etc), edit the corresponding layer. Add the points that you want to take as samples, by clicking on the image and pressing Enter for each point. We have found more efficient to use points rather than polygons, we later dilate the points by 3 pixels assuming the neighbourhood is homogeneous in terms of class, so you should avoid to use a pixel just at the edge of a feature (cloud, land). The high clouds can be visible with the 1375nm band (i.e. the band number 4 of the heavy .tif). You can load the style ``heavy_tif_clouds_green_style.qml`` to see them quickly. The ``Figure 4`` shows the image with the high clouds highlighted, and the steps to add points.
flowcharts nomenclature

Figure 4: Steps to add data points with QGIS

```{note} The ``1375nm`` band is not to be trusted blindly. The principle of this band is that the water vapour in the atmosphere usually absorbs the photons in this wavelength. However, in dry conditions, or with high altitudes terrains (such as mountains), the photons can be reflected back. This can be misleading, so the user should take precautions. A typical way to detect such artefacts is to see if the potential cirrus shape is strongly correlated with that of the underlying terrain. ``` You can now go back to true colors, and continue by editing all the wanted classes. The background class can be used if you do not want to discriminate between land and water for example, but its use is not recommended. At the end, you obtain ``Figure 5``
flowcharts nomenclature

Figure 5: Samples placed manually after the first iteration

## Step 4 Now, copy back the edited masks to the distant machine, or skip this if you work on one machine. It is time to train the model, and classify the image! Do it with : ```bash python all_run_alcd.py -f True -s 1 -l city_name_dir -d cloudy_date -c clear_date -kfold False -global_parameters path_of_global_parameters.json -paths_parameters path_of_paths_configuration.json -model_parameters path_of_model_parameters.json ``` The results can be seen in the Out directory. The regularized classification map is labeled_ img_regular.tif. You can also see the contingency table in the Statistics directory. As you can see on the classification map, figure 2.8, some pixels are not well classified. Moreover, the confidence is low in numerous places, as seen on ``Figure 6``. Therefore, we will take part of the advantage of this program: the active learning.
flowcharts nomenclature

Figure 6: Result of the first classification

flowcharts nomenclature

Figure 7: Confidence map of the first classification

## Step 5 Do an new iteration, by running : ```bash python all_run_alcd.py -f False -s 0 -l city_name_dir -d cloudy_date -c clear_date -kfold False -global_parameters path_of_global_parameters.json -paths_parameters path_of_paths_configuration.json -model_parameters path_of_model_parameters.json ``` It will save the previous iteration, and you can now edit the class layers, by adding new points (and also remove some if you made an error previously). You can copy on your local machine the outputs of the previous iteration (the bash command is given when you run the command above). We suggest to open the files ``Out/contours_labels.tif`` and ``Out/labeled_img_regular``. tif, and to apply to them the ``contours_labeled_contrasted_style.qml`` and the ``labeled_img_regular_style.qml`` styles respectively. It gives each class a recognisable color, which are given in ``table 1``.
flowcharts nomenclature

Table 1: Available classes and colors

For example, you can display the contours of the classes to see were the classifier was wrong. Here, we obtained a false detection of *clouds shadows* on the left of the image, which can be seen with the yellow contours:
flowcharts nomenclature

Figure 8: Contours of the shadows, in yellow

Therefore, we will add some points of land class in this region to increase the accuracy of our output, as shown in ``Figure 9``.
flowcharts nomenclature

Figure 9: Some land samples are added where the wrong classification is visible

Do this for the areas where a misclassification is visible. Once the wanted points in each class have been added, you can copy back the layers to the distant machine with the appropriate command. Finally, you run once again the training and the classification with : ```bash python all_run_alcd.py -f False -s 1 -l city_name_dir -d cloudy_date -c clear_date -kfold False -global_parameters path_of_global_parameters.json -paths_parameters path_of_paths_configuration.json -model_parameters path_of_model_parameters.json ``` ## Step 6 Repeat the Step 5 until you are satisfied with the classification the ALCD algorithm returns. Quick tip: some data (30% by default) are used for the validation of the model, i.e. just to compute statistics. If you want to have more samples that you add manually to be taken into account for the training part, you can increase the training_proportion in the ``global_parameters.json``. Here is an example of the classification that you could obtain after each iteration. The 6th one is considered to be good (by myself), so you can stop there.
flowcharts nomenclature

Figure 10: Evolution of the classification

As a reference, the QGIS windows at the last iteration with all the samples, with the labeled classification, and with the confidence map, are given in ``Figures 11, 12 and 13``.
flowcharts nomenclature

Figure 11: All samples present for the last iteration

flowcharts nomenclature

Figure 12: Labeled classification as seen in QGIS window for the last iteration

flowcharts nomenclature

Figure 13: Confidence map as seen in QGIS for the last iteration