Go explains in detail the idea of efficiently processing data through streaming APIs such as map / filter / foreach

Time:2022-5-26

Students who have used java are familiar with streamAPI, so inGoCan we process aggregate data in a similar way? This article introduces the built-in stream API of go zero. In order to help you understand, the functions are mainly divided into three categories: acquisition operation, intermediate processing operation and termination operation.

What is stream processing

If you have Java experience, you will be full of praise for java8 stream, which greatly improves your ability to process collection type data.

?
1
2
3
4
int sum = widgets.stream()
              .filter(w -> w.getColor() == RED)
              .mapToInt(w -> w.getWeight())
              .sum();

Stream enables us to support the style of chain call and function programming to realize data processing. It seems that the data is continuously processed in real time like a pipeline and finally summarized. The implementation idea of stream is to abstract the data processing process into a data stream, and return a new stream for use after each processing.

Stream function definition

Before writing code, think clearly. Clarifying the requirements is the most important step. We try to think about the implementation process of the whole component from the perspective of the author. First, put the logic of the underlying implementation, and try to define the stream function from scratch.

In fact, the workflow of stream also belongs to the production consumer model. The whole process is very similar to the production process in the factory. Try to define the life cycle of stream first:

  1. Creation phase / data acquisition (raw materials)
  2. Processing stage / intermediate processing (assembly line processing)
  3. Summary stage / end operation (final product)

The API is defined around the three life cycles of stream:

Creation phase

In order to create the abstract object of data stream, it can be understood as a constructor.

We support three ways to construct streams: slice transformation, channel transformation and functional transformation.

Note that the methods at this stage are ordinary public methods and do not bind to stream objects.

?
1
2
3
4
5
6
7
8
9
10
11
//Create a stream with variable parameter mode
func Just(items ...interface{}) Stream
 
//Create a stream through channel
func Range(source <-chan interface{}) Stream
 
//Create by stream function
func From(generate GenerateFunc) Stream
 
//Splicing stream
func Concat(s Stream, others ...Stream) Stream

Processing stage

The operations required in the processing stage often correspond to our business logic, such as conversion, filtering, de duplication, sorting, etc.

The API at this stage belongs to method and needs to be bound to the stream object.

The following definitions are made in combination with common business scenarios:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
//Remove duplicate items
Distinct(keyFunc KeyFunc) Stream
//Filter items by criteria
Filter(filterFunc FilterFunc, opts ...Option) Stream
//Grouping
Group(fn KeyFunc) Stream
//Return the first n elements
Head(n int64) Stream
//Return the last n elements
Tail(n int64) Stream
//Conversion object
Map(fn MapFunc, opts ...Option) Stream
//Merge items into slice to generate a new stream
Merge() Stream
//Reverse
Reverse() Stream
//Sort
Sort(fn LessFunc) Stream
//Act on each item
Walk(fn WalkFunc, opts ...Option) Stream
//Aggregate other streams
Concat(streams ...Stream) Stream

The processing logic in the processing stage will return a new stream object. Here is a basic implementation paradigm

Go explains in detail the idea of efficiently processing data through streaming APIs such as map / filter / foreach

Summary stage

The summary stage is actually the processing result we want, such as whether it matches, counting quantity, traversal, etc.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
//Check for all matches
AllMatch(fn PredicateFunc) bool
//Check for at least one match
AnyMatch(fn PredicateFunc) bool
//Check for all mismatches
NoneMatch(fn PredicateFunc) bool
//Statistical quantity
Count() int
//Empty stream
Done()
//Perform operations on all elements
ForAll(fn ForAllFunc)
//Perform actions on each element
ForEach(fn ForEachFunc)

After sorting out the requirements boundary of components, we have a clearer understanding of the stream to be implemented. In my understanding, real architects can grasp the requirements and follow-up evolution to the point of being extremely accurate, which is inseparable from in-depth thinking on the requirements and penetrating the essence behind the requirements. By substituting the author’s perspective to simulate the construction process of the whole project and learn the author’s thinking methodology, this is the greatest value of our learning open source projects.

Well, let’s try to define a complete overview of the stream interface and functions.

The function of the interface is not only a template, but also to use its abstract ability to build the overall framework of the project without falling into details at the beginning. We can quickly express our thinking process through the interface and learn to develop a top-down thinking method to observe the whole system from a macro perspective. If we fall into details at the beginning, it is easy to draw a sword and look at the details at a loss…

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
rxOptions struct {
  unlimitedWorkers bool
  workers          int
}
Option func(opts *rxOptions)
//Key generator
//Item - element in stream
KeyFunc func(item interface{}) interface{}
//Filter function
FilterFunc func(item interface{}) bool
//Object conversion function
MapFunc func(intem interface{}) interface{}
//Object comparison
LessFunc func(a, b interface{}) bool
//Ergodic function
WalkFunc func(item interface{}, pip chan<- interface{})
//Matching function
PredicateFunc func(item interface{}) bool
//Perform operations on all elements
ForAllFunc func(pip <-chan interface{})
//Perform actions on each item
ForEachFunc func(item interface{})
//Perform operations concurrently on each element
ParallelFunc func(item interface{})
//Aggregate all elements
ReduceFunc func(pip <-chan interface{}) (interface{}, error)
//Item generating function
GenerateFunc func(source <-chan interface{})
 
Stream interface {
  //Remove duplicate items
  Distinct(keyFunc KeyFunc) Stream
  //Filter items by criteria
  Filter(filterFunc FilterFunc, opts ...Option) Stream
  //Grouping
  Group(fn KeyFunc) Stream
  //Return the first n elements
  Head(n int64) Stream
  //Return the last n elements
  Tail(n int64) Stream
  //Get the first element
  First() interface{}
  //Get the last element
  Last() interface{}
  //Conversion object
  Map(fn MapFunc, opts ...Option) Stream
  //Merge items into slice to generate a new stream
  Merge() Stream
  //Reverse
  Reverse() Stream
  //Sort
  Sort(fn LessFunc) Stream
  //Act on each item
  Walk(fn WalkFunc, opts ...Option) Stream
  //Aggregate other streams
  Concat(streams ...Stream) Stream
  //Check for all matches
  AllMatch(fn PredicateFunc) bool
  //Check for at least one match
  AnyMatch(fn PredicateFunc) bool
  //Check for all mismatches
  NoneMatch(fn PredicateFunc) bool
  //Statistical quantity
  Count() int
  //Empty stream
  Done()
  //Perform operations on all elements
  ForAll(fn ForAllFunc)
  //Perform actions on each element
  ForEach(fn ForEachFunc)
}

The channel () method is used to obtain the stream pipeline attribute. Because we are facing the interface object in the concrete implementation, we expose a private method read out.

?
1
2
//Get the internal data container channel and internal method
channel() chan interface{}

Realization idea

The function definition is clear. Next, consider several engineering implementation problems.

How to realize chain call

Chain call, the builder mode used to create objects can achieve the effect of chain call. In fact, the principle of stream implementation similar to chain effect is the same. After each call, a new stream is created and returned to the user.

?
1
2
3
4
//Remove duplicate items
Distinct(keyFunc KeyFunc) Stream
//Filter items by criteria
Filter(filterFunc FilterFunc, opts ...Option) Stream

How to realize the processing effect of pipeline

The so-called pipeline can be understood as the storage container of data in the stream. In go, we can use channel as the data pipeline to achieve the asynchronous and non blocking effect when the stream chain call performs multiple operations.

How to support parallel processing

Data processing is essentially processing the data in the channel, so to realize parallel processing is nothing more than parallel consumption of the channel. Parallel processing can be realized very conveniently by using goroutine collaboration and waitgroup mechanism.

Go zero implementation

core/fx/stream.go

The implementation of stream in go zero does not define an interface, but it doesn’t matter. The logic of the underlying implementation is the same.

In order to implement the stream interface, we define an internal implementation class, where source is the channel type, simulating the pipeline function.

?
1
2
3
Stream struct {
  source <-chan interface{}
}

Create API

Channel create range

Create a stream through channel

?
1
2
3
4
5
func Range(source <-chan interface{}) Stream { 
  return Stream{ 
    source: source, 
  
}

Variable parameter mode create just

It is a good habit to create a stream through variable parameter mode and close the channel in time after writing.

?
1
2
3
4
5
6
7
8
func Just(items ...interface{}) Stream {
  source := make(chan interface{}, len(items))
  for _, item := range items {
    source <- item
  }
  close(source)
  return Range(source)
}

Function create from

Create a stream through a function

?
1
2
3
4
5
6
7
8
func From(generate GenerateFunc) Stream {
  source := make(chan interface{})
  threading.GoSafe(func() {
    defer close(source)
    generate(source)
  })
  return Range(source)
}

Because it involves the function parameter call passed in from the outside, the execution process is not available. Therefore, it is necessary to catch the runtime exception to prevent the panic error from being transmitted to the upper layer and causing the application to crash.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
func Recover(cleanups ...func()) {
  for _, cleanup := range cleanups {
    cleanup()
  }
  if r := recover(); r != nil {
    logx.ErrorStack(r)
  }
}
 
func RunSafe(fn func()) {
  defer rescue.Recover()
  fn()
}
 
func GoSafe(fn func()) {
  go RunSafe(fn)
}

Splice concat

Splice other streams to create a new stream and call the internal concat method method. The source code implementation of concat will be analyzed later.

?
1
2
3
func Concat(s Stream, others ...Stream) Stream {
  return s.Concat(others...)
}

Processing API

De duplicate distinct

Because the function parameters are passed inKeyFunc func(item interface{}) interface{}This means that it also supports custom de duplication according to business scenarios. In essence, it uses the results returned by keyfunc to realize de duplication based on map.

Function parameters are very powerful, which can greatly improve flexibility.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
func (s Stream) Distinct(keyFunc KeyFunc) Stream {
  source := make(chan interface{})
  threading.GoSafe(func() {
    //Channel remember that closing is a good habit
    defer close(source)
    keys := make(map[interface{}]lang.PlaceholderType)
    for item := range s.source {
      //Custom de duplication logic
      key := keyFunc(item)
      //If the key does not exist, write the data to the new channel
      if _, ok := keys[key]; !ok {
        source <- item
        keys[key] = lang.Placeholder
      }
    }
  })
  return Range(source)
}

Use case:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// 1 2 3 4 5
Just(1, 2, 3, 3, 4, 5, 5).Distinct(func(item interface{}) interface{} {
  return item
}).ForEach(func(item interface{}) {
  t.Log(item)
})
 
// 1 2 3 4
Just(1, 2, 3, 3, 4, 5, 5).Distinct(func(item interface{}) interface{} {
  uid := item.(int)
  //Carry out special de duplication logic for items greater than 4, and finally only one item greater than 3 is retained
  if uid > 3 {
    return 4
  }
  return item
}).ForEach(func(item interface{}) {
  t.Log(item)
})

Filter

By abstracting the filtering logic into filterffunc, and then acting on items respectively, it determines whether to write back to the new channel to realize the filtering function according to the Boolean value returned by filterffunc. The actual filtering logic is entrusted to walk method.

The option parameter contains two options:

  1. Unlimited workers does not limit the number of processes
  2. Workers limit the number of processes
?
1
2
3
4
5
6
7
8
9
FilterFunc func(item interface{}) bool
 
func (s Stream) Filter(filterFunc FilterFunc, opts ...Option) Stream {
  return s.Walk(func(item interface{}, pip chan<- interface{}) {
    if filterFunc(item) {
      pip <- item
    }
  }, opts...)
}

Use example:

?
1
2
3
4
5
6
7
8
9
func TestInternalStream_Filter(t *testing.T) {
  //Keep even 2,4
  channel := Just(1, 2, 3, 4, 5).Filter(func(item interface{}) bool {
    return item.(int)%2 == 0
  }).channel()
  for item := range channel {
    t.Log(item)
  }
}

Walk through execution

Walk means walk in English, which means to perform a walkfunc operation on each item and write the result to a new stream.

Note here that because the internal co process mechanism is used to read and write data asynchronously, the data order in the channel in the new stream is random.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
//Item element in item stream
//If the pipe item meets the conditions, it is written to pipe
WalkFunc func(item interface{}, pipe chan<- interface{})
 
func (s Stream) Walk(fn WalkFunc, opts ...Option) Stream {
  option := buildOptions(opts...)
  if option.unlimitedWorkers {
    return s.walkUnLimited(fn, option)
  }
  return s.walkLimited(fn, option)
}
 
func (s Stream) walkUnLimited(fn WalkFunc, option *rxOptions) Stream {
  //Create a buffered channel
  //The default value is 16. More than 16 elements in the channel will be blocked
  pipe := make(chan interface{}, defaultWorkers)
  go func() {
    var wg sync.WaitGroup
 
    for item := range s.source {
      //All elements of s.source need to be read
      //This also explains why the channel is finally written and remembered
      //If it is not closed, the process may be blocked all the time, resulting in leakage
      //Important, not assigning value to Val is a typical concurrency trap, which is later used in another goroutine
      val := item
      wg.Add(1)
      //Executing functions in safe mode
      threading.GoSafe(func() {
        defer wg.Done()
        fn(item, pipe)
      })
    }
    wg.Wait()
    close(pipe)
  }()
 
  //Return new stream
  return Range(pipe)
}
 
func (s Stream) walkLimited(fn WalkFunc, option *rxOptions) Stream {
  pipe := make(chan interface{}, option.workers)
  go func() {
    var wg sync.WaitGroup
    //Number of control processes
    pool := make(chan lang.PlaceholderType, option.workers)
 
    for item := range s.source {
      //Important, not assigning value to Val is a typical concurrency trap, which is later used in another goroutine
      val := item
      //It will be blocked when the co process limit is exceeded
      pool <- lang.Placeholder
      //This also explains why the channel is finally written and remembered
      //If it is not closed, the process may be blocked all the time, resulting in leakage
      wg.Add(1)
 
      //Executing functions in safe mode
      threading.GoSafe(func() {
        defer func() {
          wg.Done()
          //After the execution is completed, read the pool once and release one co process position
          <-pool
        }()
        fn(item, pipe)
      })
    }
    wg.Wait()
    close(pipe)
  }()
  return Range(pipe)
}

Use case:

The order of return is random.

?
1
2
3
4
5
6
7
8
func Test_Stream_Walk(t *testing.T) {
  //Return 300100200
  Just(1, 2, 3).Walk(func(item interface{}, pip chan<- interface{}) {
    pip <- item.(int) * 100
  }, WithWorkers(3)).ForEach(func(item interface{}) {
    t.Log(item)
  })
}

Group group

Put the item into the map by matching it.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
KeyFunc func(item interface{}) interface{}
 
func (s Stream) Group(fn KeyFunc) Stream {
  groups := make(map[interface{}][]interface{})
  for item := range s.source {
    key := fn(item)
    groups[key] = append(groups[key], item)
  }
  source := make(chan interface{})
  go func() {
    for _, group := range groups {
      source <- group
    }
    close(source)
  }()
  return Range(source)
}

Get the first N element heads

If n is greater than the actual data set length, all elements will be returned

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func (s Stream) Head(n int64) Stream {
  if n < 1 {
    panic("n must be greather than 1")
  }
  source := make(chan interface{})
  go func() {
    for item := range s.source {
      n--
      //The n value may be greater than the length of s.source, so it is necessary to judge whether it is > = 0
      if n >= 0 {
        source <- item
      }
      // let successive method go ASAP even we have more items to skip
      // why we don't just break the loop, because if break,
      // this former goroutine will block forever, which will cause goroutine leak.
      //N = = 0 indicates that the source is full and can be closed
      //Now that the source has met the conditions, why not directly break out of the loop?
      //The author mentioned the prevention of collaborative process leakage
      //Because each operation will eventually produce a new stream, and the old stream will never be called
      if n == 0 {
        close(source)
        break
      }
    }
    //The above loop jumps out, indicating that n is greater than the actual length of s.source
    //You still need to display and close the new source
    if n > 0 {
      close(source)
    }
  }()
  return Range(source)
}

Use example:

?
1
2
3
4
5
6
7
//Return to 1,2
func TestInternalStream_Head(t *testing.T) {
  channel := Just(1, 2, 3, 4, 5).Head(2).channel()
  for item := range channel {
    t.Log(item)
  }
}

Get the last n elements tail

It’s interesting here. In order to get the last n elements and use the ring slice ring data structure, let’s first understand the implementation of ring.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
//Circular slice
type Ring struct {
  elements []interface{}
  index    int
  lock     sync.Mutex
}
 
func NewRing(n int) *Ring {
  if n < 1 {
    panic("n should be greather than 0")
  }
  return &Ring{
    elements: make([]interface{}, n),
  }
}
 
//Add element
func (r *Ring) Add(v interface{}) {
  r.lock.Lock()
  defer r.lock.Unlock()
  //Writes the element to the specified location of the slice
  //The remainder here realizes the circular write effect
  r.elements[r.index%len(r.elements)] = v
  //Update next write location
  r.index++
}
 
//Get all elements
//Keep the reading order consistent with the writing order
func (r *Ring) Take() []interface{} {
  r.lock.Lock()
  defer r.lock.Unlock()
 
  var size int
  var start int
  //When circular write occurs
  //The order in which we want to start reading is the same as that in which we want to start reading
  if r.index > len(r.elements) {
    size = len(r.elements)
    //Because of circular write, the current write position index starts to be the oldest data
    start = r.index % len(r.elements)
  } else {
    size = r.index
  }
  elements := make([]interface{}, size)
  for i := 0; i < size; i++ {
    //The remainder is used to realize ring reading, and the reading order is consistent with the writing order
    elements[i] = r.elements[(start+i)%len(r.elements)]
  }
 
  return elements
}

Summarize the advantages of circular slicing:

  • Support automatic rolling update
  • Save memory

Ring slice can realize that when the fixed capacity is full, the old data is constantly covered by new data. Because of this feature, it can be used to read the last n elements of the channel.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func (s Stream) Tail(n int64) Stream {
  if n < 1 {
    panic("n must be greather than 1")
  }
  source := make(chan interface{})
  go func() {
    ring := collection.NewRing(int(n))
    //Read all elements. If the number is > N, the ring slice can realize the new data to cover the old data
    //Ensure the last n elements obtained
    for item := range s.source {
      ring.Add(item)
    }
    for _, item := range ring.Take() {
      source <- item
    }
    close(source)
  }()
  return Range(source)
}

So why not use len (source) length slices directly?

The answer is to save memory. When it comes to the ring type data structure, it has an advantage that it saves memory and can allocate resources on demand.

Use example:

?
1
2
3
4
5
6
7
8
9
10
11
12
func TestInternalStream_Tail(t *testing.T) {
  // 4,5
  channel := Just(1, 2, 3, 4, 5).Tail(2).channel()
  for item := range channel {
    t.Log(item)
  }
  // 1,2,3,4,5
  channel2 := Just(1, 2, 3, 4, 5).Tail(6).channel()
  for item := range channel2 {
    t.Log(item)
  }
}

Element conversion map

For element conversion, the internal process completes the conversion operation. Note that the output channel does not guarantee the output in the original order.

?
1
2
3
4
5
6
MapFunc func(intem interface{}) interface{}
func (s Stream) Map(fn MapFunc, opts ...Option) Stream {
  return s.Walk(func(item interface{}, pip chan<- interface{}) {
    pip <- fn(item)
  }, opts...)
}

Use example:

?
1
2
3
4
5
6
7
8
func TestInternalStream_Map(t *testing.T) {
  channel := Just(1, 2, 3, 4, 5, 2, 2, 2, 2, 2, 2).Map(func(item interface{}) interface{} {
    return item.(int) * 10
  }).channel()
  for item := range channel {
    t.Log(item)
  }
}

Merge

The implementation is relatively simple. I’ve thought about it for a long time. I didn’t think there was any scenario suitable for this method.

?
1
2
3
4
5
6
7
8
9
func (s Stream) Merge() Stream {
  var items []interface{}
  for item := range s.source {
    items = append(items, item)
  }
  source := make(chan interface{}, 1)
  source <- items
  return Range(source)
}

Reverse

Invert the elements in the channel. The flow of inversion algorithm is:

  • Find intermediate node
  • The two sides of the node begin to exchange

Notice why you use slices to receive s.source? The slice will expand automatically. Isn’t it better to use array?

In fact, you can’t use arrays here, because you don’t know that the operation of writing from stream to source is often written asynchronously in the process, and the channels in each stream may change dynamically. It’s very vivid to use pipeline to compare the workflow of stream.

?
1
2
3
4
5
6
7
8
9
10
11
func (s Stream) Reverse() Stream {
  var items []interface{}
  for item := range s.source {
    items = append(items, item)
  }
  for i := len(items)/2 - 1; i >= 0; i-- {
    opp := len(items) - 1 - i
    items[i], items[opp] = items[opp], items[i]
  }
  return Just(items...)
}

Use example:

?
1
2
3
4
5
6
func TestInternalStream_Reverse(t *testing.T) {
  channel := Just(1, 2, 3, 4, 5).Reverse().channel()
  for item := range channel {
    t.Log(item)
  }
}

Sort sort

The Intranet can call the sorting scheme of slice official package and pass in the comparison function to realize the comparison logic.

?
1
2
3
4
5
6
7
8
9
10
11
func (s Stream) Sort(fn LessFunc) Stream {
  var items []interface{}
  for item := range s.source {
    items = append(items, item)
  }
 
  sort.Slice(items, func(i, j int) bool {
    return fn(i, j)
  })
  return Just(items...)
}

Use example:

?
1
2
3
4
5
6
7
8
9
// 5,4,3,2,1
func TestInternalStream_Sort(t *testing.T) {
  channel := Just(1, 2, 3, 4, 5).Sort(func(a, b interface{}) bool {
    return a.(int) > b.(int)
  }).channel()
  for item := range channel {
    t.Log(item)
  }
}

Splice concat

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
func (s Stream) Concat(steams ...Stream) Stream {
  //Create a new unbuffered channel
  source := make(chan interface{})
  go func() {
    //Create a waigroup object
    group := threading.NewRoutineGroup()
    //Read data from the original channel asynchronously
    group.Run(func() {
      for item := range s.source {
        source <- item
      }
    })
    //Asynchronously read the channel data of the stream to be spliced
    for _, stream := range steams {
      //Each stream starts a collaborative process
      group.Run(func() {
        for item := range stream.channel() {
          source <- item
        }
      })
    }
    //Blocking waiting for read to complete
    group.Wait()
    close(source)
  }()
  //Return new stream
  return Range(source)
}

Summary API

Match all allmatch

?
1
2
3
4
5
6
7
8
9
10
11
func (s Stream) AllMatch(fn PredicateFunc) bool {
  for item := range s.source {
    if !fn(item) {
      //The s.source needs to be emptied, otherwise the goroutine in front may be blocked
      go drain(s.source)
      return false
    }
  }
 
  return true
}

Any match anymatch

?
1
2
3
4
5
6
7
8
9
10
11
func (s Stream) AnyMatch(fn PredicateFunc) bool {
  for item := range s.source {
    if fn(item) {
      //The s.source needs to be emptied, otherwise the goroutine in front may be blocked
      go drain(s.source)
      return true
    }
  }
 
  return false
}

None of them match nonematch

?
1
2
3
4
5
6
7
8
9
10
11
func (s Stream) NoneMatch(fn func(item interface{}) bool) bool {
  for item := range s.source {
    if fn(item) {
      //The s.source needs to be emptied, otherwise the goroutine in front may be blocked
      go drain(s.source)
      return false
    }
  }
 
  return true
}

Quantity statistics count

?
1
2
3
4
5
6
7
func (s Stream) Count() int {
  var count int
  for range s.source {
    count++
  }
  return count
}

Empty done

?
1
2
3
4
func (s Stream) Done() {
  //Drain the channel to prevent goroutine blocking and leakage
  drain(s.source)
}

Iterate all elements forall

?
1
2
3
func (s Stream) ForAll(fn ForAllFunc) {
  fn(s.source)
}

Iterate each element foreach

?
1
2
3
func (s Stream) ForAll(fn ForAllFunc) {
  fn(s.source)
}

Summary

So far, all the stream components have been realized. The core logic is to use the channel as the pipeline and the data as the water flow, and constantly receive / write data to the channel with the collaborative process to achieve the effect of asynchronous and non blocking.

Back to the problem mentioned at the beginning, it seems very difficult to implement a stream before you start. It’s hard to imagine that such a powerful component can be implemented in more than 300 lines of code in go.

Three language features are the basis for achieving efficiency:

  • channel
  • Synergetic process
  • Functional programming

reference material

Pipeline mode

Slice inversion algorithm

Project address

https://github.com/zeromicro/go-zero

This is about go, map / filter / foreach, etcFlow typeAPI # high efficiencyProcessing dataThis is the end of the article. For more information about go # streaming API # data processing, please search the previous articles of developeppaper or continue to browse the relevant articles below. I hope you will support developeppaper in the future!

Recommended Today

JS generate guid method

JS generate guid method https://blog.csdn.net/Alive_tree/article/details/87942348 Globally unique identification(GUID) is an algorithm generatedBinaryCount Reg128 bitsNumber ofidentifier , GUID is mainly used in networks or systems with multiple nodes and computers. Ideally, any computational geometry computer cluster will not generate two identical guids, and the total number of guids is2^128In theory, it is difficult to make two […]