Contributing¶

Thank you for your interest in contributing to kedro-databricks! This document provides guidelines and instructions for contributing to this project.

Getting Started¶

kedro-databricks is a Kedro plugin that enables running Kedro pipelines on Databricks. Before contributing, please:

Read the README.md to understand the project's purpose and features
Check the documentation for detailed usage instructions
Look at existing issues and pull requests

Prerequisites¶

Python 3.10, 3.11, or 3.12
uv (recommended) or pip for dependency management
Git for version control
A Databricks workspace (for integration testing)

Development Environment Setup¶

Using uv (Recommended)¶

Clone the repository:

git clone https://github.com/JenspederM/kedro-databricks.git
cd kedro-databricks

Install dependencies:
```
uv sync --all-extras
```

Activate the virtual environment:

source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate     # On Windows

Install pre-commit hooks:
```
pre-commit install
```

Using pip¶

Clone and create virtual environment:

git clone https://github.com/JenspederM/kedro-databricks.git
cd kedro-databricks
python -m venv .venv
source .venv/bin/activate  # On Unix/macOS

Install in development mode:
```
pip install -e ".[dev,test,docs]"
```
Install pre-commit hooks:
```
pre-commit install
```

Development Scripts¶

The project includes several helpful development scripts in the scripts/ directory:

scripts/mkdev.py <project_name>: Create a new Kedro project for development/testing
scripts/run_lint.sh: Run linting checks

Development Workflow¶

Branch Naming Convention¶

Use descriptive branch names with prefixes:

feat/: New features
fix/: Bug fixes
docs/: Documentation changes
test/: Test-related changes
chore/: Maintenance tasks
refactor/: Code refactoring
ci/: CI/CD changes

Examples: - feat/add-multi-cloud-support - fix/databricks-authentication - docs/update-getting-started

Commit Message Convention¶

This project uses Conventional Commits enforced by Commitizen:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Types: - feat: A new feature - fix: A bug fix - docs: Documentation only changes - style: Changes that do not affect the meaning of the code - refactor: A code change that neither fixes a bug nor adds a feature - test: Adding missing tests or correcting existing tests - chore: Changes to the build process or auxiliary tools

Examples:

feat: add support for multi-cloud deployments
fix: resolve authentication issue with Databricks SDK
docs: update installation instructions
test: add integration tests for deployment workflow

Code Standards¶

Linting and Formatting¶

The project uses Ruff for both linting and formatting:

# Run linting
ruff check .

# Run formatting
ruff format .

# Fix auto-fixable issues
ruff check --fix .

Code Quality Rules¶

Line length: 88 characters (managed by Ruff)
Follow PEP 8 style guidelines
Use type hints where possible
Write descriptive variable and function names
Add docstrings for public functions and classes

Pre-commit Hooks¶

Pre-commit hooks automatically run quality checks before each commit:

Commitizen: Validates commit message format
Ruff: Code linting and formatting
YAML/TOML validation: Ensures configuration files are valid
Trailing whitespace: Removes trailing spaces
End of file fixer: Ensures files end with newlines

Testing¶

Test Structure¶

Tests are organized into two categories:

Unit tests: tests/unit/ - Fast, isolated tests
Integration tests: tests/integration/ - Tests requiring external services

Running Tests¶

# Run all tests
pytest

# Run only unit tests
pytest tests/unit/

# Run only integration tests
pytest tests/integration/

NOTE: Ensure you have a valid Databricks configuration for integration tests.

Test Requirements¶

Test isolation: Each test should be independent
Fixtures: Use pytest fixtures for common test setup (see tests/conftest.py)
Mocking: Mock external dependencies in unit tests

Writing Tests¶

Test file naming: test_<module_name>.py
Test function naming: test_<function_name>_<scenario>
Use descriptive assertions: Prefer specific assertions over generic ones
Test both success and failure cases

Example:

def test_deploy_command_with_valid_config(cli_runner, tmp_path):
    """Test that deploy command succeeds with valid configuration."""
    # Arrange
    config_file = tmp_path / "databricks.yml"
    config_file.write_text("valid: config")

    # Act
    result = cli_runner.invoke(deploy_command, ["--config", str(config_file)])

    # Assert
    assert result.exit_code == 0
    assert "Deployment successful" in result.output

Documentation¶

Building Documentation¶

The project uses Zensical for documentation generation and local previews:

# Install documentation dependencies
uv sync --extra docs

# Serve documentation locally
uv run zensical serve

# Build documentation
uv run zensical build

Documentation Standards¶

API Documentation: Use docstrings with Google style
Examples: Add examples to the examples/ directory with README files. Any example should be runnable and demonstrate key features.
Changelog: Automatically generated from conventional commits

Adding Examples¶

Create a new directory in examples/ with a descriptive name
Include a README.md explaining the example
Add all necessary configuration files (databricks.yml, resources.yml, etc.)
Update the documentation index if needed

Submitting Changes¶

Pull Request Process¶

Fork the repository and create a feature branch
Make your changes following the guidelines above
Write or update tests for your changes
Update documentation if needed

Ensure all checks pass:

# Run pre-commit checks
pre-commit run --all-files

# Run tests
pytest

Submit a pull request with:
Clear title and description
Reference to related issues
Screenshots/examples if applicable

Pull Request Template¶

When creating a pull request, include:

What: Brief description of changes
Why: Motivation for the changes
How: Technical approach taken
Testing: How changes were tested
Documentation: Any documentation updates

Review Process¶

All pull requests require review from maintainers
CI checks must pass (linting and tests)
Address feedback promptly and professionally
Maintainers will merge once approved

Release Process¶

Releases are automated using Commitizen and follow semantic versioning:

Version bumping: Automated based on conventional commits
Changelog generation: Automatically created from commit messages
GitHub releases: Created automatically on version tags
PyPI publishing: Automated through CI/CD pipeline

Manual Release (Maintainers)¶

# Bump version and create changelog
cz bump

# Push tags
git push --tags

Getting Help¶

Issues: GitHub Issues
Pull Requests: GitHub Pull Requests
Documentation: Read the Docs

Code of Conduct¶

Please be respectful and professional in all interactions. We welcome contributions from everyone and strive to create an inclusive environment.

Thank you for contributing to kedro-databricks! 🚀