Class TemporalMergePolicy
LongPoint.
This policy organizes segments into time buckets based on the maximum timestamp in each segment. Recent data goes into small time windows (e.g., 1 hour), while older data is grouped into exponentially larger windows (e.g., 4 hours, 16 hours, etc.). Segments within the same time window are merged together when they meet the configured thresholds, but segments from different time windows are never merged together, preserving temporal locality.
When to use this policy:
- Time-series data where queries typically filter by time ranges
- Data with a timestamp field that can be used for bucketing
- Workloads where older data is queried less frequently than recent data
- Use cases where you want to avoid mixing old and new data in the same segment
Configuration:
TemporalMergePolicy policy = new TemporalMergePolicy()
.setTemporalField("timestamp") // Required: name of the timestamp field
.setBaseTimeSeconds(3600) // Base window size: 1 hour
.setMinThreshold(4) // Merge when 4+ segments in a window
.setMaxThreshold(8) // Merge at most 8 segments at once
.setCompactionRatio(1.2); // Size ratio threshold for merging
// By default, exponential buckets are enabled. Use .disableExponentialBuckets() to disable.
IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setMergePolicy(policy);
Time bucketing: By default, window sizes grow exponentially: baseTime,
baseTime * minThreshold, baseTime * minThreshold^2, etc. This ensures that recent data
is in small, frequently-merged windows while older data is in larger, less-frequently-merged
windows. Call disableExponentialBuckets() to use fixed-size windows instead, where all
windows have the same size (baseTime).
Compaction ratio: The setCompactionRatio(double) parameter controls when merges are
triggered. A merge is considered when the total document count across candidate segments exceeds
largestSegment * compactionRatio. Lower values (e.g., 1.2) trigger merges more
aggressively, while higher values (e.g., 2.0) allow more segments to accumulate before merging.
Set to 1.0 for most aggressive merging.
NOTE: This policy requires a timestamp field indexed as a LongPoint. The timestamp can be in seconds, milliseconds, or
microseconds (auto-detected based on value magnitude).
NOTE: Segments from different time windows are never merged together, even during
IndexWriter.forceMerge(int). If you call forceMerge(1) but have segments in
multiple time windows, you will end up with one segment per time window.
NOTE: Very old segments (older than setMaxAgeSeconds(long)) are not merged to avoid
unnecessary I/O on cold data.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.index.MergePolicy
MergePolicy.MergeAbortedException, MergePolicy.MergeContext, MergePolicy.MergeException, MergePolicy.MergeObserver, MergePolicy.MergeSpecification, MergePolicy.OneMerge, MergePolicy.OneMergeProgress -
Field Summary
Fields inherited from class org.apache.lucene.index.MergePolicy
DEFAULT_MAX_CFS_SEGMENT_SIZE, DEFAULT_NO_CFS_RATIO, maxCFSSegmentSize, noCFSRatio -
Constructor Summary
ConstructorsConstructorDescriptionSole constructor, setting all settings to their defaults. -
Method Summary
Modifier and TypeMethodDescriptionDisables exponentially growing time windows.findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) Determine what set of merge operations is necessary in order to expunge all deletes from the index.findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo, Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) Determine what set of merge operations is necessary in order to merge to<=the specified segment count.findMerges(MergeTrigger trigger, SegmentInfos segments, MergePolicy.MergeContext context) Determine what set of merge operations are now necessary on the index.longReturns the current base time window size in seconds.doubleReturns the current compaction ratio.doubleReturns the current force merge deletes percentage threshold.longReturns the current maximum age threshold in seconds.intReturns the current maximum threshold for merging.longReturns the current maximum window size in seconds.intReturns the current minimum threshold for merging.Returns the current temporal field name.booleanReturns whether exponential bucketing is enabled.setBaseTimeInSeconds(long baseTimeInSeconds) Sets the base time window size in seconds.setCompactionRatio(double compactionRatio) Sets the compaction ratio that controls when merges are triggered based on segment size distribution.setForceMergeDeletesPctAllowed(double pct) WhenIndexWriter.forceMergeDeletes()is called, only merge segments whose delete percentage exceeds this threshold.setMaxAgeSeconds(long maxAgeSeconds) Sets the maximum age threshold for merging segments.setMaxThreshold(int maxThreshold) Sets the maximum number of segments to merge at once within a time window.setMaxWindowSizeSeconds(long maxWindowSizeSeconds) Sets the maximum size for exponentially growing time windows.setMinThreshold(int minThreshold) Sets the minimum number of segments required in a time window to trigger a merge.setTemporalField(String temporalField) Sets the name of the timestamp field used for temporal bucketing.toString()Methods inherited from class org.apache.lucene.index.MergePolicy
assertDelCount, findFullFlushMerges, findMerges, getMaxCFSSegmentSizeMB, getNoCFSRatio, isMerged, keepFullyDeletedSegment, maxFullFlushMergeSize, message, numDeletesToMerge, segString, setMaxCFSSegmentSizeMB, setNoCFSRatio, size, useCompoundFile, verbose
-
Constructor Details
-
TemporalMergePolicy
public TemporalMergePolicy()Sole constructor, setting all settings to their defaults.
-
-
Method Details
-
setTemporalField
Sets the name of the timestamp field used for temporal bucketing. This field must be indexed as aLongPointand contain timestamp values in seconds, milliseconds, or microseconds (auto-detected based on value magnitude).This parameter is required and must be set before the policy can schedule any merges. The merge policy will extract the minimum and maximum timestamps from each segment to determine which time window the segment belongs to.
Default is empty (no temporal field configured, policy is inactive).
-
getTemporalField
Returns the current temporal field name.- See Also:
-
setBaseTimeInSeconds
Sets the base time window size in seconds. This determines the size of the smallest (most recent) time buckets.By default, window sizes grow exponentially:
baseTime,baseTime * minThreshold,baseTime * minThreshold^2, etc. If you calldisableExponentialBuckets(), all windows will have the same size equal tobaseTime.Smaller values create finer-grained time windows, which can improve query performance for time-range queries but may result in more segments. Larger values reduce the number of time windows but may mix data from a wider time range in the same segment.
Default is 3600 seconds (1 hour).
-
getBaseTimeInSeconds
public long getBaseTimeInSeconds()Returns the current base time window size in seconds.- See Also:
-
setMinThreshold
Sets the minimum number of segments required in a time window to trigger a merge. Higher values reduce merge frequency and I/O but allow more segments to accumulate. Lower values keep segment counts lower but increase write amplification.This threshold is also used as the growth factor for exponential bucketing (which is enabled by default). For example, with
minThreshold=4, window sizes will be:baseTime,baseTime * 4,baseTime * 16, etc.Must be at least 2 and cannot exceed
setMaxThreshold(int). Default is 4. -
getMinThreshold
public int getMinThreshold()Returns the current minimum threshold for merging.- See Also:
-
disableExponentialBuckets
Disables exponentially growing time windows. By default, older data is grouped into progressively larger time buckets:baseTime,baseTime * minThreshold,baseTime * minThreshold^2, etc.Calling this method changes the behavior so that all time windows have a fixed size equal to
baseTime, which can be useful for workloads with uniform query patterns across all time ranges.Exponential bucketing (the default) is recommended for typical time-series use cases where recent data is accessed more frequently than older data.
-
getUseExponentialBuckets
public boolean getUseExponentialBuckets()Returns whether exponential bucketing is enabled.- See Also:
-
setMaxThreshold
Sets the maximum number of segments to merge at once within a time window. Larger values allow more aggressive merging (reducing segment count faster) but increase the cost of individual merge operations.Must be at least equal to
setMinThreshold(int). When a time window accumulates more segments than this threshold, the policy will schedule multiple smaller merges rather than one large merge.Default is 8.
-
getMaxThreshold
public int getMaxThreshold()Returns the current maximum threshold for merging.- See Also:
-
setCompactionRatio
Sets the compaction ratio that controls when merges are triggered based on segment size distribution. A merge is considered when the total document count of candidate segments exceedslargestSegment * compactionRatio.Lower values (e.g., 1.2) trigger merges more aggressively, even when segment sizes are relatively balanced. Higher values (e.g., 2.0 or higher) wait for more size imbalance before merging, allowing more segments to accumulate but reducing write amplification.
Setting this to exactly 1.0 enables the most aggressive merging mode, where merges occur whenever the minimum threshold is met, regardless of segment size distribution.
This parameter works together with
setMinThreshold(int): a time window must have both (1) at leastminThresholdsegments, and (2) satisfy the compaction ratio, before a merge is triggered.Default is 1.2.
-
getCompactionRatio
public double getCompactionRatio()Returns the current compaction ratio.- See Also:
-
setMaxWindowSizeSeconds
Sets the maximum size for exponentially growing time windows. When exponential bucketing is enabled (the default), window sizes grow exponentially but are capped at this value.This prevents extremely large time windows for very old data, which could mix data from vastly different time periods. Once window size reaches this limit, all older data uses fixed-size windows of this duration.
Default is 31536000 seconds (365 days).
-
getMaxWindowSizeSeconds
public long getMaxWindowSizeSeconds()Returns the current maximum window size in seconds.- See Also:
-
setMaxAgeSeconds
Sets the maximum age threshold for merging segments. Segments containing data older than this threshold (based on current time minus the segment's maximum timestamp) will not be merged.This is useful for preventing unnecessary I/O on cold, historical data that is rarely queried. These old segments are placed in a special "old data" bucket and skipped during merge selection.
Default is
Long.MAX_VALUE(no age limit, all segments are merge candidates). -
getMaxAgeSeconds
public long getMaxAgeSeconds()Returns the current maximum age threshold in seconds.- See Also:
-
setForceMergeDeletesPctAllowed
WhenIndexWriter.forceMergeDeletes()is called, only merge segments whose delete percentage exceeds this threshold. Lower values merge more aggressively to reclaim space from deleted documents, but increase I/O and write amplification.The delete percentage is calculated as:
(deleted docs / total docs) * 100.Default is 10.0 (merge segments with more than 10% deleted documents).
-
getForceMergeDeletesPctAllowed
public double getForceMergeDeletesPctAllowed()Returns the current force merge deletes percentage threshold.- See Also:
-
findMerges
public MergePolicy.MergeSpecification findMerges(MergeTrigger trigger, SegmentInfos segments, MergePolicy.MergeContext context) throws IOException Description copied from class:MergePolicyDetermine what set of merge operations are now necessary on the index.IndexWritercalls this whenever there is a change to the segments. This call is always synchronized on theIndexWriterinstance so only one thread at a time will call this method.- Specified by:
findMergesin classMergePolicy- Parameters:
trigger- the event that triggered the mergesegments- the total set of segments in the indexcontext- the IndexWriter to find the merges on- Throws:
IOException
-
findForcedMerges
public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo, Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOExceptionDescription copied from class:MergePolicyDetermine what set of merge operations is necessary in order to merge to<=the specified segment count.IndexWritercalls this when itsIndexWriter.forceMerge(int)method is called. This call is always synchronized on theIndexWriterinstance so only one thread at a time will call this method.- Specified by:
findForcedMergesin classMergePolicy- Parameters:
segmentInfos- the total set of segments in the indexmaxSegmentCount- requested maximum number of segments in the indexsegmentsToMerge- contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.mergeContext- the MergeContext to find the merges on- Throws:
IOException
-
findForcedDeletesMerges
public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException Description copied from class:MergePolicyDetermine what set of merge operations is necessary in order to expunge all deletes from the index.- Specified by:
findForcedDeletesMergesin classMergePolicy- Parameters:
segmentInfos- the total set of segments in the indexmergeContext- the MergeContext to find the merges on- Throws:
IOException
-
toString
-