custom dirty flag tracking

Description

Currently, Hibernate supports a special dirty checking on instrumented entities
in order to improve the flush performance. IMO, this optimization can often be
rather significant. However, the drawback is that you have to use bytecode
instrumentation in order to take advantage of this performance improvement which
might not be an option in some projects.

Therefore, I wanted to propose to extend the current dirty checking during flush
in such a way that the dirtyness information can also be directly provided by
clients. Thereby, I could think of two possible approaches to do this:

1. Introduce an interface which client entities might implement in case they
have some notion of dirtyness. The interface could look something like:

public interface DirtyAwareEntity {

boolean getMightBeDirty();

void setMightBeDirty(boolean mightBeDirty);
}

Using such an interface, Hibernate could easily check whether an entity might
be dirty during flush and it could also reset the dirty flag after flush just
as is currently done for instrumented classes. So this approach would probably
be rather easy to implement and very convenient for clients since they would
only have to implement that interface on the appropriate entities and set the
dirty flag when the entity is actually modified.

2. Add some hooks on event listeners and/or on the Interceptor for querying whether
an entity is dirty and for resetting the dirty flag. E.g. one could add the
following hook method to the DefaultFlushEntityEventListener class:

protected boolean requiresDirtyCheck(FlushEntityEvent event);

By default, this method would call EntityEntry#requiresDirtyCheck(Object entity)
as is done right now.
Resetting the dirty flag could maybe be done in Interceptor#postFlush() or some
dedicated method could be provided.

BTW, I know that currently there already is the Interceptor#findDirty() method which
already allows for some custom dirty checking but the problem from a performance
point of view is that this method requires the entity's property values as parameter
which are retrieved in DefaultFlushEntityEventListener#getValues() which is the most
expensive method during flush. This drawback of the findDirty() method has often been
noticed in comments on the news groups.

I personally think it would be nice if something could be done to improve the
performance of flushing in Hibernate since from what I read on the news groups and
the like, flushing still seems to often lead to performance problems in practice,
especially in larger projects where it is often not easy to avoid flushes or to
keep the numer of entities in the session cache small. In fact, we are having quite
some trouble with that in our project and having some custom dirty checking like the
one I'm proposing here would greatly help in our project and in other projects as
well, I guess.

Attachments

1
  • 14 May 2009, 12:38 AM

Activity

Show:

Steve Ebersole February 9, 2012 at 6:22 AM

Closing for 4.1 release

Steve Ebersole January 25, 2012 at 6:23 PM

Taking a peek at the changes, the resetDirty call isn't buried in the core? i.e. we'll have the flexibility to call it where it makes the most sense?

Not really sure what you are asking here. resetDirty is called after changes are written to the db. You have to keep in mind that with this approach your code is now the keeper of this dirty flag. Hibernate can't reset it because you control it. So you can reset it whenever you see fit. This is just a hook for Hibernate to tell you that its probably a good time to reset it because those changes have been written to the db.

I'm not sure of the case where I'd have 2 pieces of logic called for isDirty and canDirtyCheck but I need to think through how I'll tie into it. I think we'll find in our cases quite a bit of improvement if we can bypass the check on our large entities.

Well I think you have to remember that this is plugged in at the SessionFactory level. You may not be controlling a dirty flag for each and every entity. Hence the canDirtyCheck.

Right now, isDirty only has a perf benefit if it returns false which will circumvent the "dirty checking". Something to keep in mind that "dirty checking" also encompasses figuring out which attributes changed. If isDirty returns true, we still need to do that work in order to determine which attributes changed.

Something extra I have contemplated here (still not sure) is to expand this CustomEntityDirtinessStrategy concept a little to also allow it to report which attributes are changed, something akin to org.hibernate.Interceptor#findDirty.

Shawn Clowater January 25, 2012 at 4:59 AM

Steve, I can't wait to give this a spin, just need to convince someone to allocate some time to get to 4 first.

Taking a peek at the changes, the resetDirty call isn't buried in the core? i.e. we'll have the flexibility to call it where it makes the most sense?

I'm not sure of the case where I'd have 2 pieces of logic called for isDirty and canDirtyCheck but I need to think through how I'll tie into it. I think we'll find in our cases quite a bit of improvement if we can bypass the check on our large entities.

Steve Ebersole January 24, 2012 at 6:10 AM

Not sure exactly where to fit this in the existing documentation. It does not see to fit nicely anywere. Maybe y'all have some suggestions?

In the meatime, I'll quickly document its use here and write a blog entry tomorrow.

The contract here is named org.hibernate.CustomEntityDirtinessStrategy. It defines only 3 methods:

CustomEntityDirtinessStrategy.java

public interface CustomEntityDirtinessStrategy { /** * Is this strategy capable of telling whether the given entity is dirty? A return of {@code true} means that * {@link #isDirty} will be called next as the definitive means to determine whether the entity is dirty. * * @param entity The entity to be check. * @param session The session from which this check originates. * * @return {@code true} indicates the dirty check can be done; {@code false} indicates it cannot. */ public boolean canDirtyCheck(Object entity, Session session); /** * The callback used by Hibernate to determine if the given entity is dirty. Only called if the previous * {@link #canDirtyCheck} returned {@code true} * * @param entity The entity to check. * @param session The session from which this check originates. * * @return {@code true} indicates the entity is dirty; {@link false} indicates the entity is not dirty. */ public boolean isDirty(Object entity, Session session); /** * Callback used by Hibernate to signal that the entity dirty flag should be cleared. Generally this * happens after previous dirty changes were written to the database. * * @param entity The entity to reset * @param session The session from which this call originates. */ public void resetDirty(Object entity, Session session);

This is what your code would implement. You specify this using the hibernate.entity_dirtiness_strategy setting (org.hibernate.cfg.AvailableSettings#CUSTOM_ENTITY_DIRTINESS_STRATEGY}}).

That's basically it.

Steve Ebersole December 28, 2011 at 5:30 AM

Just to document my thoughts on this for later...

Also might be a good idea to bundle both the FieldInterceptor and this new (proposed) EntityDirtyFlagChecker handling behind a single SessionFactory delegate. That would remove the need for the null checking in client code and makes for better encapsulation in general. Something like:

DirtyFlagManager.java

public class DirtyFlagManager { private final SessionFactoryImplementor sessionFactory; private final EntityDirtyFlagChecker customDirtyFlagChecker; public boolean isUnequivocallyDirty() { if ( getPersister().getFactory() .getServiceRegistry() .getService( InstrumentationService.class ) .isInstrumented( entity ) ) { return ! FieldInterceptionHelper.extractFieldInterceptor( entity ).isDirty(); } if ( customDirtyFlagChecker != null ) { return customDirtyFlagChecker.canSkipDirtyChecking( entity ); } return false; } public void makeDirty(Object entity) { ... } public void resetDirty(Object entity) { ... } }

Obviously there needs to be some unification of method names here, but in general I think this is a good thing...

Fixed

Details

Assignee

Reporter

Labels

Time tracking

4.57h logged

Components

Fix versions

Priority

Created May 10, 2009 at 12:44 PM
Updated December 23, 2013 at 3:58 PM
Resolved January 24, 2012 at 6:50 AM