Contributing¶
Thank you for your interest in contributing to kedro-databricks! This document provides guidelines and instructions for contributing to this project.
Getting Started¶
kedro-databricks is a Kedro plugin that enables running Kedro pipelines on Databricks. Before contributing, please:
- Read the README.md to understand the project's purpose and features
- Check the documentation for detailed usage instructions
- Look at existing issues and pull requests
Prerequisites¶
- Python 3.10, 3.11, or 3.12
- uv (recommended) or pip for dependency management
- Git for version control
- A Databricks workspace (for integration testing)
Development Environment Setup¶
Using uv (Recommended)¶
-
Clone the repository:
-
Install dependencies:
-
Activate the virtual environment:
-
Install pre-commit hooks:
Using pip¶
-
Clone and create virtual environment:
-
Install in development mode:
-
Install pre-commit hooks:
Development Scripts¶
The project includes several helpful development scripts in the scripts/ directory:
scripts/mkdev.py <project_name>: Create a new Kedro project for development/testingscripts/run_lint.sh: Run linting checks
Development Workflow¶
Branch Naming Convention¶
Use descriptive branch names with prefixes:
feat/: New featuresfix/: Bug fixesdocs/: Documentation changestest/: Test-related changeschore/: Maintenance tasksrefactor/: Code refactoringci/: CI/CD changes
Examples:
- feat/add-multi-cloud-support
- fix/databricks-authentication
- docs/update-getting-started
Commit Message Convention¶
This project uses Conventional Commits enforced by Commitizen:
Types:
- feat: A new feature
- fix: A bug fix
- docs: Documentation only changes
- style: Changes that do not affect the meaning of the code
- refactor: A code change that neither fixes a bug nor adds a feature
- test: Adding missing tests or correcting existing tests
- chore: Changes to the build process or auxiliary tools
Examples:
feat: add support for multi-cloud deployments
fix: resolve authentication issue with Databricks SDK
docs: update installation instructions
test: add integration tests for deployment workflow
Code Standards¶
Linting and Formatting¶
The project uses Ruff for both linting and formatting:
# Run linting
ruff check .
# Run formatting
ruff format .
# Fix auto-fixable issues
ruff check --fix .
Code Quality Rules¶
- Line length: 88 characters (managed by Ruff)
- Follow PEP 8 style guidelines
- Use type hints where possible
- Write descriptive variable and function names
- Add docstrings for public functions and classes
Pre-commit Hooks¶
Pre-commit hooks automatically run quality checks before each commit:
- Commitizen: Validates commit message format
- Ruff: Code linting and formatting
- YAML/TOML validation: Ensures configuration files are valid
- Trailing whitespace: Removes trailing spaces
- End of file fixer: Ensures files end with newlines
Testing¶
Test Structure¶
Tests are organized into two categories:
- Unit tests:
tests/unit/- Fast, isolated tests - Integration tests:
tests/integration/- Tests requiring external services
Running Tests¶
# Run all tests
pytest
# Run only unit tests
pytest tests/unit/
# Run only integration tests
pytest tests/integration/
NOTE: Ensure you have a valid Databricks configuration for integration tests.
Test Requirements¶
- Test isolation: Each test should be independent
- Fixtures: Use pytest fixtures for common test setup (see
tests/conftest.py) - Mocking: Mock external dependencies in unit tests
Writing Tests¶
- Test file naming:
test_<module_name>.py - Test function naming:
test_<function_name>_<scenario> - Use descriptive assertions: Prefer specific assertions over generic ones
- Test both success and failure cases
Example:
def test_deploy_command_with_valid_config(cli_runner, tmp_path):
"""Test that deploy command succeeds with valid configuration."""
# Arrange
config_file = tmp_path / "databricks.yml"
config_file.write_text("valid: config")
# Act
result = cli_runner.invoke(deploy_command, ["--config", str(config_file)])
# Assert
assert result.exit_code == 0
assert "Deployment successful" in result.output
Documentation¶
Building Documentation¶
The project uses Zensical for documentation generation and local previews:
# Install documentation dependencies
uv sync --extra docs
# Serve documentation locally
uv run zensical serve
# Build documentation
uv run zensical build
Documentation Standards¶
- API Documentation: Use docstrings with Google style
- Examples: Add examples to the
examples/directory with README files. Any example should be runnable and demonstrate key features. - Changelog: Automatically generated from conventional commits
Adding Examples¶
- Create a new directory in
examples/with a descriptive name - Include a
README.mdexplaining the example - Add all necessary configuration files (
databricks.yml,resources.yml, etc.) - Update the documentation index if needed
Submitting Changes¶
Pull Request Process¶
- Fork the repository and create a feature branch
- Make your changes following the guidelines above
- Write or update tests for your changes
- Update documentation if needed
- Ensure all checks pass:
- Submit a pull request with:
- Clear title and description
- Reference to related issues
- Screenshots/examples if applicable
Pull Request Template¶
When creating a pull request, include:
- What: Brief description of changes
- Why: Motivation for the changes
- How: Technical approach taken
- Testing: How changes were tested
- Documentation: Any documentation updates
Review Process¶
- All pull requests require review from maintainers
- CI checks must pass (linting and tests)
- Address feedback promptly and professionally
- Maintainers will merge once approved
Release Process¶
Releases are automated using Commitizen and follow semantic versioning:
- Version bumping: Automated based on conventional commits
- Changelog generation: Automatically created from commit messages
- GitHub releases: Created automatically on version tags
- PyPI publishing: Automated through CI/CD pipeline
Manual Release (Maintainers)¶
Getting Help¶
- Issues: GitHub Issues
- Pull Requests: GitHub Pull Requests
- Documentation: Read the Docs
Code of Conduct¶
Please be respectful and professional in all interactions. We welcome contributions from everyone and strive to create an inclusive environment.
Thank you for contributing to kedro-databricks! 🚀