Migrate datasets from GitHub to Hugging Face Hub
Manage and analyze labeled datasets
List of French datasets not referenced on the Hub
Organize and process datasets using AI
Organize and process datasets efficiently
Find and view synthetic data pipelines on Hugging Face
Upload files to a Hugging Face repository
Manage and label datasets for your projects
Convert a model to Safetensors and open a PR
Validate JSONL format for fine-tuning
Browse and view Hugging Face datasets from a collection
Create a report in BoAmps format
Review and rate queries
The Github To Huggingface Dataset Migration Tool is a utility designed to simplify the process of migrating datasets from GitHub repositories to the Hugging Face Hub. It enables users to seamlessly transfer their dataset files, maintaining the integrity and structure of the data. This tool is particularly useful for researchers and data scientists who want to leverage Hugging Face's ecosystem for dataset sharing, versioning, and collaboration.
• Support for GitHub Repositories: Migrate datasets stored in public or private GitHub repositories.
• Dataset Versioning: Preserve dataset versions during the migration process.
• Large Dataset Handling: Efficiently manage and transfer large-scale datasets.
• Data Format Compatibility: Supports various data formats commonly used in machine learning workflows.
• Integration with Hugging Face Libraries: Compatibility with Hugging Face's ecosystem, including datasets and transformers libraries.
• Repository Structure Mapping: Maintain the original folder and file structure of the dataset during migration.
• Commit History Preservation: Optionally retain commit history and metadata from GitHub.
• Validation and Verification: Ensure data integrity through pre- and post-migration checks.
What types of GitHub repositories are supported?
The tool supports both public and private GitHub repositories. For private repositories, ensure you have the necessary authentication credentials.
Can I migrate datasets that are split across multiple repositories?
Yes, the tool allows you to specify multiple repositories or subdirectories within a repository to migrate as a single dataset.
How are dataset versions handled during migration?
The tool preserves the version history by creating separate versions on Hugging Face Hub. You can also choose to create a new version or overwrite an existing one.