Stream Gatherers (JEP 485, JDK 24) — Practical Use Case & Example
JEP 485 (Stream Gatherers) introduces a new way to efficiently collect and process bulk data in Java Streams. Enhance the Stream API to support custom intermediate operations. This will allow stream pipelines to transform data in ways that are not easily achievable with the existing built-in intermediate operations.
Scenario
You are building a real-time data processing pipeline that needs to efficiently aggregate sensor readings from thousands of IoT devices and compute insights in batches.
Problem Before Java 24 (Without Stream Gatherers)
Before Java 24, if you wanted to process elements in bulk, you had to use intermediate collections (which consume extra memory and slow down performance).
List<SensorReading> readings = fetchSensorData();
List<Double> temperatures = readings.stream()
.map(SensorReading::getTemperature)
.collect(Collectors.toList()); // Extra collection created!
// Process data in batches manually
for (int i = 0; i < temperatures.size(); i += 100) {
List<Double> batch = temperatures.subList(i, Math.min(i + 100, temperatures.size()));
processBatch(batch);
}
💡 Issues:
- Requires an extra collection (
List<Double> temperatures
), which consumes memory. - Processing in batches manually is inefficient.
🚀 Solution: Using Stream Gatherers
With Java 24 Stream Gatherers, you can process data without extra collections and optimize batch processing.
List<SensorReading> readings = fetchSensorData();
readings.stream()
.<List<Double>>gather((downstream) -> list -> {
if (list.size() == 100) { // Process every 100 readings
downstream.push(new ArrayList<>(list));
list.clear();
}
})
.forEach(batch -> processBatch(batch)); // Efficient batch processing
✨ How This Works:
.gather()
groups the data into chunks without needing an intermediate collection.- When 100 elements are accumulated, the batch is sent downstream.
- The downstream collector (
forEach(batch -> processBatch(batch))
) processes the data immediately.
💡 Advantages Over Traditional Streams:
- No unnecessary list creation → Saves memory.
- Automatic batch processing → No need for manual chunking.
- Optimized for large datasets → Great for real-time pipelines.
🔹 Where to Use Stream Gatherers?
✅ Processing large datasets in batches (e.g., IoT sensor readings, logs, financial transactions).
✅ Streaming data from APIs or databases efficiently.
✅ Reducing memory overhead in high-throughput applications.
This feature is super useful for performance-heavy applications that require bulk processing of streams efficiently. 🚀
— -
Any thoughts/feedback? let me know in the comment section and follow for more such articles!