. net environment using opentracing on a large scale

Time:2020-3-2

By Austin Parker

One of the biggest advantages of opentracing is the community built around it, covering a variety of languages and technologies. With that in mind, I’m happy to post a guest post today on the opentracing blog by Aaron Stannard. Aaron Stannard is the founder and CEO of petabridge, a startup that helps. Net companies build large-scale distributed systems. He is also a co-founder of the akka.net project. You can find him on twitter at https://twitter.com/aaron onth


For the past five years, I have been a maintainer and co-founder of the akka.net open source project, which was originally developed in scala and is a popular C and f migration of akka project. I started this project initially because the. Net ecosystem lacked the tools and framework to build real-time large-scale application types, just like the type I developed at that time in marketup, the start-up company of marketing automation and analysis that I ran.

After closing marketup, I continued to create petabridge, an open source company dedicated to supporting and developing akka.net and other distributed system technologies in. Net.

I am happy to report that now the. Net community has a stronger open source ecosystem and more tool options for building the large-scale application types in. Net that I worked on in 2013-14.

With the emergence of. Net core, the whole. Net ecosystem is undergoing great changes. Net core is a new implementation of high-performance, lightweight and 100% cross platform. Net runtime. This opens up a new possibility for. Net developers, which had not existed before.

Large scale. Net using akka.net and actor model

Akka and akka.net, if you haven’t heard of them, are implementations of actor models built on top of general-purpose virtual machines (JVM and CLR, respectively). Actor model is an old concept that can be traced back to the early 1970s, but in recent years it has been revitalized because it provides an understandable computing model that can be easily distributed in a large data center or public cloud environment.

What do you ask, “understandable computing models” do? Specifically, the actor model has found a home for developers who need to build scalable real-time systems, such as:

  • Multiplayer video games;
  • Analysis;
  • Marketing automation;
  • Medical / medical Internet of things;
  • Logistics, transportation and transportation;
  • Energy;
  • Finance; and
  • Real time transaction processing (ACh, payment processor, etc.)

What all of these applications have in common is that they fulfill their obligations to customers and stakeholders, who must be able to do their work in a consistent, fast (real-time) manner, regardless of the total size (scalability) of the system. In order for these applications to meet both goals, they must be stateful, which means that the real source comes from application memory, not from an external database. In order for stateful applications to be fault-tolerant and highly available, they must also be decentralized, and the state cannot be centralized in one area, otherwise the system is vulnerable to single point bottleneck and single point fault limit.

This is what the actor model allows developers to do: build highly decentralized, fault-tolerant, stateful applications, in which each unit of work (actor) is a self-contained private state and cannot be modified directly from the outside. The only way to modify the actor’s state is by sending a message to the actor, which will eventually process the message, which may result in updating the actor’s state.

In. Net, akka.net is the main actor model implementation to build these types of applications. It is used by hundreds of companies, including Dell, Bank of America, Boeing, S & P global, Becton Dickinson, U.S. Department of energy, Zynga, etc.

However, the actor model presents some major challenges for the software team trying to adopt it on a large scale. The most painful one is to diagnose and debug programming errors and network related problems on a large scale. This is the launch time of opentracing and distributed tracing.

Using opentracing to understand complexity at a low cost

The problem with akka.net and large-scale distributed actors is that your system can interact tens of millions of times per second at any given time, which seems too similar:

. net environment using opentracing on a large scale

Each actor in akka.net actor system usually has a small amount of self-contained state, some message processing code performs its actual work, and some references to other actors with which it often communicates. Actors communicate with each other by passing messages back and forth. By default, 100% of the messages delivered in the actor model are asynchronous. Actors always process messages in the order they are sent, but one actor may have to process messages from many other actors.

Actors can communicate transparently with each other across process and network boundaries, so messages sent to a single actor within a process may eventually spread to multiple processes. The problem is that this location transparency makes actors so good at distributing work in an extensible way, which can be frustrating when they debug when there’s a problem in production: knowing where and when it’s going to happen becomes a special problem, especially when you’ve had millions of such operations all the time.

This is where we find opentracing particularly useful.

Akka.net application is not a single thread, but a single process. They are highly concurrent and usually distributed processes.As a result, traditional tracing tools, such as intellitrace, commonly used in. Net, can’t help us to answer “what’s wrong?” within the system.

What we need is distributed tracking tools, which can collect context from multiple processes, correlate them, and tell the whole story from the perspective of distributed system. We need to be able to answer questions such as “what is akka.tcp://[email protected]:1100/user/actor A / child2 sending to akka.tcp://[email protected]:1100/user/processb/child1 after receiving MSG1?” Only the distributed tracing tools running on these two processes can effectively answer this question, which is why we use opentracing on petabridge.

Opentracing implementation and benefits

Petabridge specializes in supporting large-scale users of akka.net, which means that we must provide various tools to help them make their lives easier. That’s why we started to create Phobos, a monitoring and tracking solution for akka.net.

We hope to help our users solve this akka.net observability problem by developing some distributed tracking implementation, which can be easily included in their application code. But we have a small problem: our customers can’t accept a single vendor’s solution for application performance monitoring. They will definitely not accept that it is only applicable to akka.net, but not other important. Net technologies, such as asp.net core and signalr.

Opentracking solves this problem gracefully and simply for us: by targeting opentracking standards rather than any single sales solution, such as Zipkin or Jaeger, we can open the door for our customers to choose any tracking solution they want. We also know that we are likely to create opentracing compatible drivers for. Net users who want to be able to use us and other products that rely on the standard.

Therefore, we build Phobos tracking function for excellent opentracking C library, and design the first-party integration of Zipkin, Jaeger and other tools based on opentracking binding. This greatly reduces our development cost and increases the freedom of choice enjoyed by users.

Every time an actor sends or receives a message, we create a new span and propagate the tracking identifier to every message we pass between actors, including over the network. We were able to build all of this, so it worked behind the scenes without requiring too much manual instrumentation. To be sure, opentracing allows us to use Jaeger to make understandable graphics like this:

. net environment using opentracing on a large scale

In this case, we are modeling a “fan out” call, in which one node makes a call to many other nodes through the network, something difficult to capture using traditional tools, because it involves a large number of concurrent processing on multiple nodes and asynchronous communication between everyone. But using opentracing standard, we can easily use tools like Jaeger to achieve this. Jaeger has a good opentracing compatible driver in C ා.

Create opentracing driver in. Net

Once Phobos fully supports opentracing, as our end-user integration point, we know that any akka.net user who has an internal or third-party tracking solution, but does not support opentracing itself, can finally find a way to use opentracing library to connect things together.

However, we decided to redouble our efforts to adopt some of the existing tools that are already popular in the. Net community, or to lower the entry threshold by introducing first-party opentracing drivers and adapters for these products.

The first one we built is petabridge.tracking.zipkin, a high-performance opentracking compatible driver for Zipkin; we want to use Zipkin internally, and we want to support native transport options like Kafka.

At the request of many. Net users, the second and more interesting one we built is the Microsoft application insights opentracing adapter for our akka.net tracing product.

For users running on azure, we hope to support application insights as the tracking target, but there is no built-in solution for inserting application insights into opentracing. Therefore, we have followed the standard document written by Microsoft team, which allows us to map application insights routines on the dictionary of opentracing, and can create an open-source software package petabridge.tracing.applicationinsights, which bridges the gap between the two technologies and makes application insights perfectly feasible in large akka.net applications.

. net environment using opentracing on a large scale

After we released the package, we found that even Microsoft itself is using opentracing and our application insights driver to test some of their own cloud applications internally. This is a good thing for everyone in the entire. Net ecosystem: as opentracing continues to gain traction, it will help drive its use as an industry standard practice.

As we continue to drive the boundaries between the size and speed of large-scale. Net systems, organizations like ours will continue to invest in technologies such as opentracing, as well as its promising monitoring rival, openmetrics, to limit the operating and management costs of running these systems. So far, opentracing has brought amazing performance to our company and the entire akka.net project, and we look forward to seeing more in the future.

-Aaron Stannard,
Petabridge CEO
Akka.net project co founder


Proposal solicitation (CFP) of 2019 kubecon + cloudnativecon China Forum is now open

Kubecon + cloudnativecon forum brings together users, developers and practitioners to exchange and cooperate face to face. Participants include kubernetes, Prometheus and other leaders of cloud native Computing Foundation (CNCF) sponsored projects, to discuss the development direction of cloud native ecosystem with us.

Proposal solicitation (CFP) of China open source summit 2019 is now open

At the China open source summit, participants will work together and share information, learn about the latest and most interesting open source technologies, including Linux, containers, cloud technologies, networks, micro services, etc., and get information on how to guide and lead the open source community.

Date:

  • Proposal solicitation deadline: Friday, February 15, 11:59 PM PST
  • Proposal solicitation notice date: April 1, 2019
  • Notice date of meeting schedule: April 3, 2019
  • Slide submission deadline: Monday, June 17
  • Date of the conference: June 24-26, 2019

In 2019, kubecon + cloudnativecon + open source summit China sponsorship program is coming out