mirror of https://github.com/langgenius/dify.git
- Add database index on (dataset_id, index_node_hash) for efficient deduplication queries - Add deduplication check in SegmentService.create_segment and multi_create_segment methods - Add deduplication check in DatasetDocumentStore.add_documents method to prevent duplicate embedding processing - Skip creating segments with identical content hashes across the entire dataset This prevents duplicate content from being re-processed and re-embedded when uploading documents with repeated content, improving efficiency and reducing unnecessary compute costs. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| _workflow_exc.py | ||
| account.py | ||
| api_based_extension.py | ||
| base.py | ||
| dataset.py | ||
| engine.py | ||
| enums.py | ||
| model.py | ||
| oauth.py | ||
| provider.py | ||
| provider_ids.py | ||
| source.py | ||
| task.py | ||
| tools.py | ||
| types.py | ||
| web.py | ||
| workflow.py | ||