Skip to content

Conversation

@gechandesu
Copy link
Contributor

@gechandesu gechandesu commented Nov 30, 2025

Added a generic function subtract for subtracting array elements and a test.

assert arrays.subtract([1, 2, 3, 4, 5, 6, 7], [3, 5, 6]) == [1, 2, 4, 7]

This is a convenient way to solve quite often task to find the difference between sets of elements. arrays.diff module is not suitable for this.

@tankf33der
Copy link
Contributor

@gechandesu what if subtract? Check spelling.

@gechandesu
Copy link
Contributor Author

@gechandesu what if subtract? Check spelling.

You're right, my english is bad...

@gechandesu gechandesu changed the title arrays: add substract/2 arrays: add subtract/2 Nov 30, 2025
@tankf33der
Copy link
Contributor

My tests show this is an equivalent function. What do you think?

fn mike[T](a []T, b []T) []T {
	mut result := []T{cap: a.len}
	for elem in a {
		if elem !in b {
			result << elem
		}
	}
	return result
}

@gechandesu
Copy link
Contributor Author

My tests show this is an equivalent function. What do you think?

fn mike[T](a []T, b []T) []T {
	mut result := []T{cap: a.len}
	for elem in a {
		if elem !in b {
			result << elem
		}
	}
	return result
}

This is much slower on large arrays. I've tested it elem !in b actually is a nested loop over b array.

In the worst case (O(n*m) vs O(n+m)):

/tmp $ v run arrays.v
 SPENT    10.685 ms in subtract
 SPENT  8108.553 ms in mike
Benchmark

import benchmark

fn mike[T](a []T, b []T) []T {
	mut result := []T{cap: a.len}
	for elem in a {
		if elem !in b {
			result << elem
		}
	}
	return result
}

fn subtract[T](a []T, b []T) []T {
	mut result := []T{cap: a.len}
	mut b_set := map[T]bool{}
	for elem in b {
		b_set[elem] = false
	}
	for elem in a {
		if elem !in b_set {
			result << elem
		}
	}
	return result
}

fn main() {
	iters := 1
	len := 100_000
	a := []int{len: len, init: 1}
	b := []int{len: len}
	mut bench := benchmark.start()
	for _ in 0 .. iters {
		_ := subtract(a, b)
	}
	bench.measure('subtract')
	for _ in 0 .. iters {
		_ := mike(a, b)
	}
	bench.measure('mike')
}

I'd like to avoid map, as it limits the input types, but the price is very low performance. Of course, I could add comptime type check and use a slower algorithm for anything that isn't supported as a map key, but for now I've decided to leave it as is.

@tankf33der
Copy link
Contributor

For science, i would like to show where mike() is faster too.

import benchmark
import rand

fn mike[T](a []T, b []T) []T {
        mut result := []T{cap: a.len}
        for elem in a {
                if elem !in b {
                        result << elem
                }
        }
        return result
}

fn subtract[T](a []T, b []T) []T {
        mut result := []T{cap: a.len}
        mut b_set := map[T]bool{}
        for elem in b {
                b_set[elem] = false
        }
        for elem in a {
                if elem !in b_set {
                        result << elem
                }
        }
        return result
}

fn main() {
        iters := 1_000_000
        mut bench := benchmark.start()
        for _ in 0 .. iters {
                a := rand.bytes(rand.u8())!
                b := rand.bytes(rand.u8())!
                _ := subtract(a, b)
        }
        bench.measure('subtract')
        for _ in 0 .. iters {
                a := rand.bytes(rand.u8())!
                b := rand.bytes(rand.u8())!
                _ := mike(a, b)
        }
        bench.measure('mike')
}

@gechandesu
Copy link
Contributor Author

Yes, map allocations may slow down on small arrays.

/tmp $ v run arrays.v
 SPENT 20909.702 ms in subtract
 SPENT 12172.886 ms in mike
/tmp $ v run arrays.v
 SPENT 20919.367 ms in subtract
 SPENT 12194.749 ms in mike
/tmp $ v run arrays.v
 SPENT 20893.922 ms in subtract
 SPENT 12174.510 ms in mike

I'll try to find input data on which the implementation without map starts to work slower than the implementation with map to write a combined version that will select the version of the algorithm based on the length of the arrays. It is okay?

@gechandesu gechandesu marked this pull request as draft December 1, 2025 15:33
@jorgeluismireles
Copy link
Contributor

Or the word difference could be used as more appropiate for sets: https://en.wikipedia.org/wiki/Set_(mathematics)#Set_difference
Venn0110

@gechandesu
Copy link
Contributor Author

True. I thought about it some more. I close this PR since the same thing is already implemented in datatypes.Set, I missed that. It's better than a new ugly function with inconsistent behavior.

@gechandesu gechandesu closed this Dec 2, 2025
@gechandesu gechandesu deleted the arrays_substract branch December 2, 2025 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants