Development Guide#
Are you a geoscientist or a software developer?
Welcome to the WATex Development Guide. Your ideas and contributions are vital in building the leading library for addressing geoscientific challenges in groundwater and water exploration.
Getting Started with Python
If you’re new to Python, we highly recommend visiting external resources to familiarize yourself with Python development basics.
Foreword#
Before diving in, it’s essential to understand that watex is not solely
focused on machine learning. It serves as a bridge between machine learning
techniques and geoscientific disciplines, particularly in hydrogeology and
geophysics. Think of it as an applied machine learning toolkit designed for the
geosciences, primarily hydrogeophysics. Nonetheless, watex is versatile.
It supports the development of algorithms that not only predict but also aid
academics, professionals, and the broader geoscience community in achieving
outstanding results in groundwater exploration (GWE).
Several online tutorials cater to specific domains within geosciences and machine learning:
Core Philosophy#
The fundamental philosophy of watex draws inspiration from Scikit-learn
and, to some extent, GMT. It champions open collaboration,
inviting contributions from everyone.
Contributors are welcomed regardless of their level of expertise in programming or geosciences, though a foundational knowledge in either field is beneficial. Engaging with both aspects can significantly enhance the project’s development and impact.
Aiming to significantly expand its reach within the geoscience community over the next five years, watex maintains flexible
and inclusive guidelines. Contributors are encouraged to propose ideas that align with the Sustainable Development Goal 6 (SDG 6)
on clean water and sanitation (SDG n6). The diversity of skills and perspectives among our contributors
is our strength, driving the project forward.
Ways to Contribute:
Improving Documentation: Typos and enhancements in the documentation are welcome. You can submit changes via email to our mailing list or, preferably, through a GitHub pull request. Understanding that the original designers are not native English speakers, we kindly ask for patience regarding the language quality and assure regular revisions to improve clarity and correctness.
Reporting Issues: Your feedback is invaluable. Report any issues encountered, and support others’ reported issues with a “thumbs up” if they affect you too. Promoting
watexthrough blogs, articles, or simply starring the GitHub repository shows your support and helps spread the word.
Adding New Algorithms: This opportunity is particularly aimed at developers proficient in Python or Cython. This guide will delve into the specifics of contributing new algorithms to the
watexlibrary.
Development#
Fork the Repository#
Contributing to watex begins with forking the main repository on GitHub,
followed by submitting a pull request (PR):
Create an account on GitHub if you haven’t already.
Fork the project repository: Click on the ‘Fork’ button near the top of the page to create a copy of the code under your GitHub account. For detailed instructions on forking a repository, see this guide.
Clone your fork of the
watexrepository to your local machine:git clone git@github.com:YourLogin/watex.git # Use --depth 1 if you have a slow connection cd watex
Install the necessary development dependencies:
pip install scikit-learn numpy scipy pandas matplotlib tables h5py seaborn pyyaml h5py joblib
Add the
upstreamremote to keep your fork synchronized with the main repository. This step ensures you can easily fetch the latest changes:git remote add upstream git@github.com:WEgeophysics/watex.gitVerify the upstream and origin remotes are correctly set up by executing git remote -v, which should display:
origin git@github.com:YourLogin/watex.git (fetch) origin git@github.com:YourLogin/watex.git (push) upstream git@github.com:WEgeophysics/watex.git (fetch) upstream git@github.com:WEgeophysics/watex.git (push)
With these steps, your watex installation and Git repository are now correctly set up and ready for development.
Add Algorithms#
When integrating new algorithms into watex, two primary development
paths are available:
Development in accordance with the scikit-learn API
Development following the principles of GMT
Development Following the Scikit-learn API (DSKL)#
The DSKL approach emphasizes the use of the fit() method for computing and populating attributes of instantiated models, including plotting modules. This methodology is applicable for both supervised and unsupervised learning, often employing transform() or predict() methods to either transform data or infer properties. The typical workflow involves:
Selecting the model class by importing the appropriate estimator or assessor from a module. An assessor is designed for a specific task.
Setting model hyperparameters by instantiating the chosen class with desired values.
Organizing data into a features matrix and target vector as previously discussed.
Fitting the model to your data by calling the model’s fit() method, applicable even to plotting modules.
Applying the trained model to new data; for supervised learning, this usually means predicting labels for unknown data, whereas for unsupervised learning, it may involve transforming data or inferring properties using the transform() or predict() methods.
This approach is familiar to developers acquainted with Scikit-learn.
It’s important to note that all classes adhering to DSKL must follow Python’s class convention rules outlined in PEP8. This includes adopting the fit method for initial operations such as modular calculus, validating data structures, and controlling parameters.
Furthermore, class parameters should bear the same name as instance attributes, and any internal attributes (not explicitly exposed to users) should conclude with an underscore _. Here is an illustrative example:
>>> class DemoClass:
""" Class documentation. """
def __init__(self, param1=value1, param2=value2, **kws):
self.param1 = param1
self.param2 = param2
def fit(self, data, **fit_params):
""" Fit method documentation. """
X = fit_params.pop('X', None)
y = fit_params.pop('y', None)
...
self.param3_ = ...
...
return self
The fit method is a cornerstone of model development, always returning the instance self. For algorithms not aimed at prediction, \(X\) and \(y\) are included as fit_params keywords, along with other parameters, diverging from Scikit-learn where models primarily focus on prediction. This flexibility supports the library’s broader application in addressing geoscience engineering challenges. It enables the development and testing of new machine learning (ML) algorithms through real-case studies to assess their effectiveness. The fit_params may encompass various parameters, allowing the fit method to integrate seamlessly across functions.
For geoscience issues extending beyond hydro-geophysics, developers can create
modules within the geology sub-package, fostering interdisciplinary
solutions. New prediction-focused algorithms should implement predict,
transform, or fit_transform methods, excluding keyword arguments in
their initialization to streamline model training and evaluation:
>>> from watex.exlib.sklearn import BaseEstimator, TransformerMixin
>>> class DemoClass(BaseEstimator, TransformerMixin):
"""Class documentation here."""
def __init__(self, param1=value1, param2=value2):
self.param1 = param1
self.param2 = param2
def fit(self, X, y=None, **fit_params):
"""Fit method documentation."""
self.param3_ = ...
...
return self
def predict(self, X):
"""Predict method documentation."""
...
return Xp
In the example above, \(X_p\) represents the predicted outcome based on \(X\).
This design, devoid of keyword arguments at initialization and inheriting from
BaseEstimator and TransformerMixin,
facilitates cross-validation and hyperparameter tuning.
Note
For those unfamiliar with scikit-learn, algorithm design remains flexible. Yet,
documentation should specify the adopted technique or library for hyperparameter
optimization early on, such as Keras or
TensorFlow. Postponing validation
aligns with Scikit-learn’s API to
avoid redundancy in validation efforts, particularly useful in context with
watex.exlib.GridSearchCV, watex.models.GridSearch, and
watex.models.GridSearchMultiple, where systematic parameter tuning and
validation are critical.
Development Following GMT (DGMT)#
DGMT development is characterized by flexibility rather than strict conventions. Distinctively, all GMT classes should conclude with an underscore ‘_’, setting them apart from the DSKL approach. These classes do not require a fit method. Upon instantiation, all attributes are initialized, and the initial operation is executed. Here is an illustrative example of DGMT syntax:
>>> class DemoClass_:
"""Class documentation."""
def __init__(self, data, param1=None, param2=None, **kws):
self.data = data
self.param1 = param1
self.param2 = param2
...
for key in kws:
setattr(self, key, kws[key])
self._fit_democlass()
def _fit_democlass(self):
"""_fit_democlass method documentation."""
...
self.param3_ = ...
...
The underscore “_” suffix in the class name is a hallmark of DGMT. Moreover, the fit method, when present, begins with an underscore and is named in lowercase, reflecting the class’s operational context. The method _fit_democlass is called following attribute initialization and a loop that dynamically assigns additional attributes via keyword arguments.
Both DGMT and DSKL denote instance and class attributes not passed as parameters
(e.g., param3_) with an underscore. Similarly, methods considered internally
significant to the class’s operation also start with an underscore.
The integration of DGMT syntax within watex pays homage to the widespread
use and familiarity of GMT software
within the geosciences community. This choice respects the established coding
practices of many developers in this field, maintaining a bridge between traditional
geoscience software development and modern coding standards, thereby fostering
a comfortable and recognizable framework for geoscientist developers.
Report Bugs#
Reporting bugs is crucial for enhancing the stability of watex. A well-documented bug report enables others to replicate the issue and contributes to identifying a solution. For guidance on composing a comprehensive bug report, consult this Stack Overflow article and this blog post by Matthew Rocklin.
Verifying the bug on the main branch is a recommended step to ensure the issue persists. Additionally, review existing bug reports and pull requests to check if the bug has been previously identified or addressed.
Effective bug reports should:
Present a concise, self-contained Python code snippet that demonstrates the issue. Employ either:
GitHub Flavored Markdown for a well-formatted display:
```python >>> from watex.base import Data >>> d = Data(...) ... ```
Or reStructuredText for structured documentation:
.. code-block:: >>> from watex.base import Data >>> d = Data(...) ...
Detail the version information of watex and its dependencies, achievable through the library’s version display function:
>>> import watex as wx >>> wx.show_versions()
Provide a brief description of the bug and the expected behavior, aiding in a quicker and more accurate resolution.
Embarking on Scientific Python#
If you’re venturing into the scientific Python landscape for the first time, we’ve curated a list of essential resources to kickstart your journey. These materials are meticulously selected to enrich your understanding and utilization of watex.
Key Resources for Beginners:#
Python Scientific Lecture Notes: (Scipy Lectures): A cornerstone resource offering a foundational grasp of the Python scientific stack. A rudimentary knowledge of NumPy arrays is particularly advantageous for effective watex usage.
Python for Data Analysis: (View Book): Specializes in data manipulation techniques using Pandas, NumPy, and IPython, equipping you with the skills for data-centric projects.
Python Data Science Handbook: (Beginner’s Guide): An all-encompassing handbook ideal for newcomers to data science, covering essential tools and methods for analysis and machine learning.
Machine Learning and Its Applications: (Insights by Wlodarczak): Delve into the significance and practical applications of Machine Learning across various sectors, illustrating its pivotal role today.
For French-speaking Developers:#
Apprendre à programmer avec Python: (Guide by Gérard Swinnen): An exemplary guide for French speakers, offering a deep dive into Python programming and facilitating substantial progress in your Python learning journey.
These resources are designed to guide you through the expansive realm of scientific Python, from introductory programming concepts to advanced applications in machine learning. Engaging with these texts will enhance your Python capabilities, thereby augmenting your contributions to watex and the broader scientific discourse.