[S#3543] Piece-wise linear compression of column groups first working prototype #2415 #2420

mori49 · 2026-01-29T18:53:44Z

No description provided.

Test Fix

# Conflicts: # src/main/java/org/apache/sysds/runtime/compress/CompressionSettings.java # src/main/java/org/apache/sysds/runtime/compress/colgroup/scheme/ColGroupPiecewiseLinearCompressed.java # src/test/java/org/apache/sysds/runtime/compress/colgroup/ColGroupPiecewiseLinearCompressedTest.java

janniklinde

Thank you for your first contribution @mori49, this is a good start. I left some comments in the code. You used segmented least squares, which is a fine approach (even though control over the actual loss is quite limited). One limiting factor is compression complexity of O(n³), which is not viable for production compression. This particular approach can be optimized to O(n²). This could be achieved by precomputing prefix sums or SSE matrix (please first address the smaller formatting issues and other code suggestions before approaching that optimization).
In general, we may think of a more lightweight and accurate method to preserve targetLoss as an upper bound (this will be part of after the first submission deadline).
Also, please avoid german comments or variable/method names in your contribution.

janniklinde · 2026-01-30T10:04:06Z

...ava/org/apache/sysds/runtime/compress/colgroup/scheme/ColGroupPiecewiseLinearCompressed.java

Move this class from colgroup/scheme package to colgroup/.
In general, all methods that are currently unimplemented should throw new NotImplementedException()

janniklinde · 2026-01-30T10:07:21Z

bin/systemds-standalone.sh

This file should not be part of the PR. You can keep it locally but you should untrack it and not add it to your commits. You could use git rm --cached bin/systemds-standalone.sh.

janniklinde · 2026-01-30T10:09:57Z

src/main/java/org/apache/sysds/runtime/compress/colgroup/AColGroup.java

It seems like you reformatted the file to revert the tabs -> spaces conversion, which is good. However, there are still many unnecessary changes. I would recommend you revert that file to the original state of this repository and then only add the enum CompressionType PiecewiseLinear

janniklinde · 2026-01-30T10:20:08Z

src/main/java/org/apache/sysds/runtime/compress/colgroup/ColGroupFactory.java

+	public static AColGroup compressPiecewiseLinearFunctional(IColIndex colIndexes, MatrixBlock in,
+		CompressionSettings cs) {
+
+		//Erstmal den Inhalt einer Spalte speichern
+
+		int numRows = in.getNumRows();
+		int colIdx = colIndexes.get(0); //Die erste Spalte
+		double[] column = getColumn(in, colIdx);
+
+		//Sette den Targetloss
+
+		// Breakpoints bestimmen: Einteilung der Segmente
+
+		List<Integer> breakpointsList = computeBreakpoints(cs, column);
+		int[] breakpoints = breakpointsList.stream().mapToInt(Integer::intValue).toArray();
+		//Für jedes Segment lineare Regression als kompressionsverfahren
+
+		// 3) Pro Segment Regression -> a,b
+		int numSeg = breakpoints.length - 1;
+		double[] slopes = new double[numSeg];
+		double[] intercepts = new double[numSeg];
+
+		for(int s = 0; s < numSeg; s++) {
+			int start = breakpoints[s];
+			int end = breakpoints[s + 1];
+
+			double[] ab = regressSegment(column, start, end); // nutzt gleiche Stats wie computeSegmentCost
+			slopes[s] = ab[0];
+			intercepts[s] = ab[1];
+		}
+		//Erstelle die Datenstruktur: PiecewiseLinearColGroupCompressed
+
+		return ColGroupPiecewiseLinearCompressed.create(colIndexes, breakpoints, slopes, intercepts, numRows);
+	}
+
+	public static double[] getColumn(MatrixBlock in, int colIndex) {
+		int numRows = in.getNumRows();          // Anzahl der Zeilen [web:16]
+		double[] column = new double[numRows];  // Variable für die Spalte
+
+		for(int r = 0; r < numRows; r++) {
+			column[r] = in.get(r, colIndex);  // Wert (r, colIndex) lesen [web:16][web:25]
+		}
+		return column;
+	}
+
+	public static List<Integer> computeBreakpoints(CompressionSettings cs, double[] column) {
+		int n = column.length;
+		double targetMSE = cs.getPiecewiseTargetLoss();
+		// Fall A: kein TargetLoss angegeben -> einfache Variante mit fixem λ
+		if(Double.isNaN(targetMSE) || targetMSE <= 0) {
+			double lambda = 5.0;
+			return computeBreakpointsLambda(column, lambda);
+		}
+
+		// Fall B: TargetLoss gesetzt -> globales Fehlerbudget berücksichtigen
+		double sseMax = n * targetMSE; // MSE -> SSE-Budget
+
+		double lambdaMin = 0.0;   // viele Segmente, minimaler Fehler
+		double lambdaMax = 1e6;   // wenige Segmente, mehr Fehler
+
+		List<Integer> bestBreaks = null;
+
+		for(int it = 0; it < 20; it++) { // Binärsuche auf λ
+			double lambda = 0.5 * (lambdaMin + lambdaMax);
+
+			List<Integer> breaks = computeBreakpointsLambda(column, lambda);
+			double totalSSE = computeTotalSSE(column, breaks);
+
+			if(totalSSE <= sseMax) {
+				// Budget eingehalten: wir können versuchen, mit größerem λ noch weniger Segmente zu nehmen
+				bestBreaks = breaks;
+				lambdaMin = lambda;
+			}
+			else {
+				// Fehler zu groß: λ verkleinern, mehr Segmente zulassen
+				lambdaMax = lambda;
+			}
+		}
+
+		if(bestBreaks == null)
+			bestBreaks = computeBreakpointsLambda(column, lambdaMin);
+
+		return bestBreaks;
+	}
+
+	public static List<Integer> computeBreakpointsLambda(double[] column, double lambda) {
+		int sizeColumn = column.length;
+		double[] dp = new double[sizeColumn + 1];
+		int[] prev = new int[sizeColumn + 1];
+
+		dp[0] = 0.0;
+
+		for(int index = 1; index <= sizeColumn; index++) {
+			dp[index] = Double.POSITIVE_INFINITY;
+			for(int i = 0; i < index; i++) { // Segment [i, index)
+				double costCurrentSegment = computeSegmentCost(column, i, index); // SSE
+				double candidateCost = dp[i] + costCurrentSegment + lambda;
+				if(candidateCost < dp[index]) {
+					dp[index] = candidateCost;
+					prev[index] = i;
+				}
+			}
+		}
+
+		List<Integer> segmentLimits = new ArrayList<>();
+		int breakpointIndex = sizeColumn;
+		while(breakpointIndex > 0) {
+			segmentLimits.add(breakpointIndex);
+			breakpointIndex = prev[breakpointIndex];
+		}
+		segmentLimits.add(0);
+		Collections.sort(segmentLimits);
+		return segmentLimits;
+	}
+
+	public static double computeSegmentCost(double[] column, int start, int end) {
+		int n = end - start;
+		if(n <= 1)
+			return 0.0;
+
+		double[] ab = regressSegment(column, start, end);
+		double slope = ab[0];
+		double intercept = ab[1];
+
+		double sse = 0.0;
+		for(int i = start; i < end; i++) {
+			double x = i;
+			double y = column[i];
+			double yhat = slope * x + intercept;
+			double diff = y - yhat;
+			sse += diff * diff;
+		}
+		return sse; // oder sse / n als MSE
+	}
+
+	public static double computeTotalSSE(double[] column, List<Integer> breaks) {
+		double total = 0.0;
+		for(int s = 0; s < breaks.size() - 1; s++) {
+			int start = breaks.get(s);
+			int end = breaks.get(s + 1);
+			total += computeSegmentCost(column, start, end); // SSE des Segments
+		}
+		return total;
+	}
+
+	public static double[] regressSegment(double[] column, int start, int end) {
+		int n = end - start;
+		if(n <= 0)
+			return new double[] {0.0, 0.0};
+
+		double sumX = 0, sumY = 0, sumXX = 0, sumXY = 0;
+		for(int i = start; i < end; i++) {
+			double x = i;
+			double y = column[i];
+			sumX += x;
+			sumY += y;
+			sumXX += x * x;
+			sumXY += x * y;
+		}
+
+		double nD = n;
+		double denom = nD * sumXX - sumX * sumX;
+		double slope, intercept;
+		if(denom == 0) {
+			slope = 0.0;
+			intercept = sumY / nD;
+		}
+		else {
+			slope = (nD * sumXY - sumX * sumY) / denom;
+			intercept = (sumY - slope * sumX) / nD;
+		}
+		return new double[] {slope, intercept};
+	}
+


To keep this file clean, I recommend that you create a new class called PiecewiseLinearUtils in the package functional. Your compressPiecewiseLinearFunctional(...) then just calls PiecewiseLinearUtils.compressSegmentedLeastSquares(...).

janniklinde · 2026-01-30T10:24:12Z

src/main/java/org/apache/sysds/runtime/compress/CompressionSettingsBuilder.java

Here please revert the file. Did you change anything in this file (except tabs->spaces which you should be reverted)?

You might consider creating a variable double targetLoss and a method public CompressionSettingsBuilder setTargetLoss(double loss) {...}. If you then add the targetLoss as a variable in the CompressionSettings constructor, you directly set the target loss via CompressionSettingsBuilder

janniklinde · 2026-01-30T10:40:05Z

src/test/java/org/apache/sysds/test/component/compress/colgroup/ColGroupFactoryTest.java

+import static org.apache.sysds.runtime.compress.colgroup.ColGroupFactory.computeSegmentCost;
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.fail;
+import static org.junit.jupiter.api.Assertions.assertTrue;


Remove jupiter assertions, that will cause the build to fail as we don't use jupiter.

janniklinde · 2026-01-30T10:47:08Z

...t/java/org/apache/sysds/runtime/compress/colgroup/ColGroupPiecewiseLinearCompressedTest.java

There should be no underscores in method names.

Move this test file to test/component/compress/colgroup.

You have a lot of isolated tests (which also look like autogenerated tests and not handwritten). It would be nice to have more tests. Please remove some redundant ones, and add tests on randomly generated data (with a fixed seed) where you create a ColGroupPiecewiseLinearCompressed and then decompressToDenseBlock. You then compare it to the original data and compute a loss (which should be no more than some upper bound).

janniklinde · 2026-01-30T10:48:30Z

src/main/java/org/apache/sysds/runtime/compress/CompressionSettings.java

+	/**
+	 * Ziel-Gesantverlust für piecewise Lineace Komocession• Interpretation: maximal entaubter Alobaler MSE pro Went in
+	 * der Sealte. O.O ~ quasi verlustfrei, viele Segmente >0 ~ mehr Approximation entaubt, weniger Segmente
+	 */


Weird comment

janniklinde · 2026-01-30T10:52:47Z

src/main/java/org/apache/sysds/runtime/compress/colgroup/ColGroupFactory.java

+		//Erstmal den Inhalt einer Spalte speichern
+
+		int numRows = in.getNumRows();
+		int colIdx = colIndexes.get(0); //Die erste Spalte


You take the first column, which is fine for now, but in a finished implementation you would either repeat compression on every column or do a multidimensional regression, where you treat a 'row' of all indices as a vector.

janniklinde · 2026-01-30T10:58:57Z

...ava/org/apache/sysds/runtime/compress/colgroup/scheme/ColGroupPiecewiseLinearCompressed.java

+
+	@Override
+	public double getIdx(int r, int colIdx) {
+		// ✅ CRUCIAL: Bounds-Check für colIdx!


Avoid emojis;
Also, they are usually a hint of LLM generated code (which is strictly forbidden for your submissions)

mori49 and others added 15 commits January 23, 2026 17:25

wip:SYSTEMDS-3543

163e4e4

Meine lokalen Änderungen

f5df4ea

Merge upstream/main mit meinen lokalen Änderungen

f6500d1

wip: test

8f5c844

Test Fix

11415fa

wip: test

5301f8f

Merge pull request #1 from janniklinde/MaryamMain

a31116d

Test Fix

fix: Methods and testing

d63aae8

wip: decompressing

78460b5

add: Enum Compressiontype piecewiselinear

f42b766

add: include functionality of piecewise linear compression

47256c0

add: Comment

505c0cc

add: dispatch test and remove unused imports

103abd8

fix: reformat code mit Eclipse XML Profile

31b957d

github-project-automation bot added this to SystemDS PR Queue Jan 29, 2026

github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 29, 2026

janniklinde reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[S#3543] Piece-wise linear compression of column groups first working prototype #2415 #2420

[S#3543] Piece-wise linear compression of column groups first working prototype #2415 #2420

mori49 commented Jan 29, 2026

Uh oh!

janniklinde left a comment •

edited

Loading

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

janniklinde Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[S#3543] Piece-wise linear compression of column groups first working prototype #2415 #2420

Are you sure you want to change the base?

[S#3543] Piece-wise linear compression of column groups first working prototype #2415 #2420

Conversation

mori49 commented Jan 29, 2026

Uh oh!

janniklinde left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janniklinde left a comment •

edited

Loading