fix: use isolated session in _on_query to prevent premature commit

The _on_query method was calling db.session.commit() on the Flask-scoped
SQLAlchemy session, which committed all pending dirty state from the
current request — not just the DatasetQuery audit rows.

This broke transaction isolation: if the downstream workflow failed, the
subsequent db.session.rollback() could not revert the already-committed
modifications (e.g. token deductions, partial node executions), leaving
dirty data in the database.

The same file already demonstrates the correct pattern in
_on_retrieval_end, which uses sessionmaker(bind=db.engine).begin() with an
independent session. This change applies the same approach to _on_query.

Additionally fixed a latent bug where dataset_queries.add_all() was called
inside the loop on every iteration, re-adding previously accumulated rows.

Fixes #37886
This commit is contained in:
GitHub Contributor 2026-06-26 04:09:16 +08:00
parent a246dc8b17
commit 3b8559521d

View File

@ -1030,6 +1030,9 @@ class DatasetRetrieval:
):
"""
Persist dataset query audit rows for retrieval requests.
Uses an independent session to avoid committing the request-scoped
db.session, which would break transaction isolation for the caller.
"""
if not query and not attachment_ids:
return
@ -1059,9 +1062,9 @@ class DatasetRetrieval:
created_by=created_by,
)
dataset_queries.append(dataset_query)
if dataset_queries:
db.session.add_all(dataset_queries)
db.session.commit()
if dataset_queries:
with sessionmaker(bind=db.engine).begin() as session:
session.add_all(dataset_queries)
def _retriever(
self,