v0.1.6-alpha (February 22, 2023)#

These are minor changes performed from v0.1.5 after the review comments from Editors and Reviewers from the softwareX journal to improve the paper and software. The manuscript faced two review tours of minor revisions. The first from reviewers (February 20th) and second from Editors(March 06th ). Here are the comments/replies sections of Editors/reviewers that includes some bug fixes and improvements. The Editors/reviewers comments are in italic whereas the replies are in normal text.

First review#

The first reply from Editors/reviewers occurs in February 20th, 2022. Here are the comments:

Reviewer #1#

I have read the paper "machine-learning research in hydro-geophysics". It is an interesting paper about the application of machine learning in hydrogeophysics and the open-source code is also useful for hydrogeophysics society. After I tested the codes, I found parts of the codes may be improved. The paper is generally well-written and I recommend it to be published after a minor revision. Here are my comments:

1) It seems useful to consider the ``openpyxl`` package as a hard dependency. Some modules in the “Geology ” sub-package are called public API which uses “openpyxl” (see watex.__init__.py file). When running watex for the first time, the missing “openpyxl” is required. To avoid this annoyed behavior and fix the bug, try to set “openpyxl” as the required dependency or move the geology module (Structures and Structural) from the public API.

Reply: openpyxl is now part of the hard dependency at the initiliation of the package. This is visible in code lines 66–67 of watex.__init__.py.

2) To make the software more attractive at the glance for the non-dedicated reader in the geophysical and hydrogeology domains, the term “hydro-geophysics” can be modified to “water exploration “for short as the aim goal and name of the software. I suggest “Machine learning in water exploration”` or "machine learning research in water exploration". The first one is short and global while the second gives a new perspective since the term research in the title makes the software dynamic and new methods can be added and included many geosciences fields that imply water exploration.

Reply: We have selected the second choice and modified the title in the revised version of the paper. The title becomes “machine learning research in water exploration”. We also modified the abstract a bit more for consistency.

3) In the module watex.methods.em, the authors computed the skew ( watex.methods.Processing.skew()) from the Processing module As the mathematical concept is explained, it should be better to write the code for skew visualization from Bahr or Swift. This could help users to easily determine the type of structures (1D, 2D, 3D or distorted, …) without the necessary output of the two-dimensional matrices. (See the references guide in the docstrings ).

Reply: A new code plot plotting skew is henceforth written. The phase-sensitive skew can be visualized using the watex.utils.plot_skew() function. In addition, we also give a consistent plot for skew visualization in watex.view.TPlot.plotSkew() method where the user can easily customize the plot accordingly. Furthermore, the watex.view.TPlot.plot_phase_tensors() plot also gives an alternative way for skew visualization in pseudo-section format by specifying the tensor parameter to skew. Here are two examples of skew plots.
- Plot skew from (watex.utils.plot_skew()):
```
>>> import watex as wx
>>> from watex.utils.plotutils import plot_skew
>>> edi_sk = wx.fetch_data ("edis", return_data =True , samples = 20 ) # fetch 20 samples of EDI objets
>>> plot_skew (edi_sk)
>>> plot_skew (edi_sk, threshold_line= True)
```
- Plot skew from phase tensor plot ( watex.view.TPlot.plot_phase_tensors()):
```
>>> tplot = wx.TPlot ().fit(edi_sk )
>>> tplot.plot_phase_tensors (tensor ='skew')
```
See also the examples Plot phase sensitive skew and Plot Pseudosection Phase tensors.

4) I suggest writing a complete application example as you did for predicting k “in the step-by-step” guide in the software documentation that involves the missing tensor and recovery of 2D tensors. It seems you used the preprocessed data (from watex.datasets.load_edis()) for illustration where no missing data is available. This is not meaningful.

Reply: To solve this issue, we used a real dataset collected from Huayuan area, Hunan province, China, which is composed of missing tensors. We implement in this new version, the data set function as load_huayuan (watex.datasets.load_huayuan()). The parameter raw can allow to retrieval of noised data for the sake of user to better comprehend the trick behind the recovery tensors. For demonstration and a real guidance, we fetched 27 sations and the result are displayed and missing tensors can be vsualized as well as the way to handle them. Here is quick implementation, however in the gallery example, the step-by-step guide gives further explanations:
```
>>> data = wx.fetch_data('huayuan', return_data =True, samples =27 ,
                     key ='raw', clear_cache=True) # clear watex cache data to save new EDI
>>> tro = wx.EMProcessing().fit(data)
>>> wx.view.plot2d(tro.make2d(out= 'resxy' ) , y = tro.freqs_,to_log10= True)
```
The results shows blank spaces in resistivity tensor in TE mode (xy). After applying the recovery trick, all complete tensors has be recovered at all frequencies as:
```
>>> tro.component ='yx'
>>> wx.view.plot2d(tro.zrestore ( tensor ='res'), y = tro.freqs_,to_log10= True)
```
After recovery, the data is full-strength amplitudes for processing. More examples in Restoring tensor with noised AMT data and Plot Pseudosection Phase tensors.

5) The motivation illustrates the importance of hydrology but lacks an illustration of the importance of hydrogeophysics. I suggest adding a part to introduce the development of hydrogeophysics and how it helps hydrology studies (e.g., Binley et al., 2015 [1] ; Parsekian et al., 2015 [2]; Chen, 2022 [3])

Reply: Fixed it in the manuscript new version (Fixed it in new MS)

Reviewer #2#

1) (Lines 137-142) the parameters are computed from the selected conductive zone; the loss or weak frequency signal are recovered and new tensors are updated. Please provide more details about the computation process, and how to recover and update the related dataset.

Reply: The explanation of this section has been enforced in the revised MS and clearly explained with the different options the user can use for selecting and recoverupdate the tensors. The example Restoring tensor with noised AMT data gives more details.

2) (Lines 146-148) What is the meaning of the ‘features manipulation got from the previous step’?

Reply: Fixed it in new MS and reformulate the sentence as follow: - [The next step (Params space) consists to aggregate the different prediction parameters computed from the previous step to build the predictor \([X,y ]\) or export for EM modeling in the case of NSAMT to external software …]

3) (Lines 149-154) In ‘learning space’ step, what are the algorithms applied for the training and testing models? Is the algorithm freely selected by the user or automatically selected according to the previous datasets? How to determine the ‘appropriate modules’ ?

Reply: Fixed it in the MS. We replied to this section in the replied MS by giving the step and some appropriate algorithms for feature transformations whereas the training and testing models are handled by the “models”(watex.models) module. See the software functionnalities section of the paper in Learning space. Below is an example for what we explain the manuscript.

When the user objective is to predicting FR , user can select some pretrained models of watex.models. To get the available of pretrained models, user can do this:

>>> from watex.models.premodels import p
>>> p.keys
('SVM', 'SVM_', 'LogisticRegression', 'KNeighbors', 'DecisionTree',
   'Voting', 'RandomForest', 'RandomForest_', 'ExtraTrees',
   'ExtraTrees_', 'Bagging', 'AdaBoost', 'XGB', 'Stacking'
)

For instance to fetch the pretrained watex.exlib.LogisticRegression best parameters, just call:

>>> p.LogisticRegression.best_params_
{'penalty': 'l2',
'dual': False,
'tol': 0.0001,
'C': 1.0,
'fit_intercept': True,
'intercept_scaling': 1,
'class_weight': None,
'random_state': None,
'solver': 'lbfgs',
'max_iter': 100,
'multi_class': 'auto',
'verbose': 0,
'warm_start': False,
'n_jobs': None,
'l1_ratio': None
}

However some models with geology structures collected in a particular area could obviously not correspond to the pretrained geological survey area. In that case, user can retrain its data to fine-tune models hyperparameters into a single line of codes by feeding to the algorithms many models and save the training phase results into a disk. Here is an example:

>>> from watex.models import GridSearchMultiple , displayFineTunedResults
>>> from watex.exlib import LinearSVC, SGDClassifier, SVC, LogisticRegression
>>> X, y  = wx.fetch_data ('bagoue prepared')
>>> X
... <344x18 sparse matrix of type '<class 'numpy.float64'>'
... with 2752 stored elements in Compressed Sparse Row format>

As example, we can build four estimators and provide their grid parameters range for fine-tuning as:

>>> random_state=42
>>> logreg_clf = LogisticRegression(random_state =random_state)
>>> linear_svc_clf = LinearSVC(random_state =random_state)
>>> sgd_clf = SGDClassifier(random_state = random_state)
>>> svc_clf = SVC(random_state =random_state)
>>> estimators =(svc_clf,linear_svc_clf, logreg_clf, sgd_clf )
>>> grid_params= ([dict(C=[1e-2, 1e-1, 1, 10, 100], gamma=[5, 2, 1, 1e-1, 1e-2, 1e-3],kernel=['rbf']),
          dict(kernel=['poly'],degree=[1, 3,5, 7], coef0=[1, 2, 3], C= [1e-2, 1e-1, 1, 10, 100])],
          [dict(C=[1e-2, 1e-1, 1, 10, 100], loss=['hinge'])],
          dict()], # we just no provided parameter for logreg_clf to let user try by himseft)
          [dict()] # idem for sgd_clf
          )

Now we can call watex.models.GridSearchMultiple for training and self-validating as:

Warning

Note that if you decide to run the script below , it will take a while depending of your processor performance. However, we recommend to try as you can and alternatively, you can also provide the parameter range of watex.exlib.LogisticRegression & watex.exlib.SGDClassifier for for fine-tuning. Moreover, you can also do the same task by setting the watex.models.GridSearchMultiple parameter kind to RandomizedSearchCV for exercice.

>>> gobj = GridSearchMultiple(estimators = estimators,
                   grid_params = grid_params ,
                   cv =4,
                   scoring ='accuracy',
                   verbose =1,   # set minimum verbosity ; > 7 outputs more messages
                   savejob=False ,  # set true to save your job into a binary disk file.
                   kind='GridSearchCV').fit(X, y)

Once the parameters are fined-tuned, we can display the fined tuning results using watex.models.displayFineTunedResults() functions or other similar functions in the module: watex.models.validation like : watex.models.displayModelMaxDetails() or watex.models.displayCVTables() or else as:

>>> displayFineTunedResults (gobj.models.values_)
MODEL NAME = SVC
BEST PARAM = {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'}
BEST ESTIMATOR = SVC(C=100, gamma=0.01, random_state=42)
MODEL NAME = LinearSVC
BEST PARAM = {'C': 100, 'loss': 'hinge'}
BEST ESTIMATOR = LinearSVC(C=100, loss='hinge', random_state=42)
MODEL NAME = LogisticRegression
BEST PARAM = {}
BEST ESTIMATOR = LogisticRegression(random_state=42)
MODEL NAME = SGDClassifier
BEST PARAM = {}
BEST ESTIMATOR = SGDClassifier(random_state=42)

4) (Lines 155-158) ‘enough plots for data exploration, feature analysis and discussion, tensor recovery, and model inspection’. In View space part, in addition to the sounding curve plot and DC-parameters discussing plot as shown in Figures 2 and 3, what kind of plots can be provided for the above exploration and analysis?

Reply: Some examples of plots with their functionalities are enumerated in the revised MS in software functionalities: - [in ExPlot (watex.view.ExPlot) … watex.utils.plot_sbs_feature_selection() plots Sequential Backward Selection (SBS) for feature selection and collects the scores of the best feature subset at each stage…]

Refer to full user guide and view for further documentation.

5) In this work, how to reduce the collection of k-parameter? Please provide some comparisons or explanations to show the differences from the expensive k parameter detection in previous work.
Reply: We replied to this answer in the revised MS in the motivation and significance section and about the k-parameter prediction, we have submitted a paper in Engineering Geology, and is still under consideration (http://dx.doi.org/10.2139/ssrn.4326365).

6) Comments for the Software/Code:
6.1) (Line 1564 - 1780) Tensors recovery in the processing module The method “zrestore” is used to recover the weak and missing signals in the EDI data. I have run the method, but it seems you used the preprocessed data (Impedance tensors are already recovered) for illustration. This is visible in the documentation too. It looks not seem meaningful to practice this way. Even if the data is not available, you can:
- generate a synthetic data where the tensor is missing and then apply the recovery technique to recover the missing tensors, or
- use a sample of real-world EDI data (if data is available) where data is noised and the signals are missing , then use the recovery approach with the method “zrestore” to recover it.
You may select one of these options. This is useful to show the readers and scientific community the relevance of the technique and ascertain its trueness.

Reply: We selected option 2 and we provided a convenient application step-by-step guide with a concrete example of a missing tensor in the Huayuan survey area for the user. This comment seems addressed too closely to comment 4 of reviewer #1 Our answer is explained in supported by examples. Please, could refer to the reply section of comments N4 of reviewer #1.

6.2) (Line 779 - 1021 ) Fix the bug in ResistivityProfiling class in module electrical Indeed, when the constraints are applied and the auto-detection indicates that there is no possibility of making a drill on this ERP line. It is better to stop the running “fit” method rather than let it continue since no DC parameters can be calculated. Formatting a warning message to the user is very important in that case. This is not applied in your case. For instance, after running, the user can think that parameters are correctly calculated and could try to fetch the table of prediction parameters. While no parameters are calculated the summary method of ResistivityProfiling generates a “getattributeError”. You may try to fix it by formatting the warning message in the summary method ( if applicable ) and stopping the running process of the “fit” method.

Reply: Thanks for this suggestion. We fixed it and stop running the program when no suitable area for the drilling location is found when constraints are applied. Henceforth, an ERPError raises, and a warning message is thrown that no suitable location was detected. Furthermore, there is another exception emitted in summary methods to smartly warn users that DC parameters cannot be computed when the ERP line is not suitable for the drilling location. (refer to code line 999 -1021 of summary() method ).

6.3) (Structural class Line 335 ) Module geology. The verbose attribute is not set properly. While Structural inherits from module Base, ‘verbose’ must be set in the Base module since “Super” will call it straightforwardly.

Reply: We fixed by implementing verbose parameter as an attribute in watex.geology.core.Base module of ( Line 80 and 82)

Second review#

The second reply from Editors occurs in March 06th after the first requiered review completed. Here are the comments:

1) Abstract: It needs to be an abstract for a multidisciplinary audience. You do not define DC, EM or what is the meaning of the k parameter. Can you make it more general?

Reply: We fixed it by defining the DC (Direct-current), EM ( Electromagnetic ), and k for the permeability coefficient. Also, we replaced especially for the abstract the word “tensor” with “missing data” in the following sentence: << Indeed, the recovery of EM tensors … >> by << Indeed, the recovery of EM missing data … >>

2) Intro: - I suggest splitting up the first paragraph (it is very long). Maybe at lines 58 and 75?

Reply: We did it. The first paragraph is henceforth divided into two.

3) Again note that the first paragraph should be written for a general audience whereas it gets very technical and I have a hard time understanding it. Can you try to make the first paragraph for a general reader like me.?

Reply: Of course! We have tried our best to reconsider all the technical words and translated them into simplified terms and sentences. As you can see in the manuscript first paragraph (henceforth divided into two), we have simplified some sentences by rewriting them expecting to be easily understood by public audience.

3) Fig3. might be better in 2x2 rather than 1x4 format

Reply: We fixed it by combining it with the first new figure and deleting the literature related to him in the illustration example (previous version). The figure is henceforth moved to the software functionalities section when enumerating the different possible plots offered by the library in addition to the two new others plots. To support the comment of reviewer #2 about the additional plots that can be provided by the software, we also added two plots (watex.utils.plot_confidence_in() and watex.view.TPlot.plot_phase_tensors()) for clarity since Figure speaks more than ten thousand words. The total figures are henceforth 4 rather in the new version rather than the 3 in the previous version.

We are grateful to the anonymous reviewers for their contributions, suggestions and comments to improve the MS and fix bugs in the software for the GWE research progress.

Best regards!