Skip to main content

Command Palette

Search for a command to run...

WebRTC Internals: Real-Time Communication

Updated
5 min read
D
I'm Dineth Kodippili, a full-stack developer currently working at Qriomatrix while pursuing my third year of undergraduate studies at the University of Sri Jayewardenepura. My approach to development is stack-agnostic — I adapt to what the problem needs, with a strong focus on cloud solutions using AWS. On this blog, I share tutorials, industry insights, and honest perspectives on navigating a tech career as a student developer. My goal is to document what I learn and contribute something useful back to the community.

WebRTC is deceptively easy to start with. A few JavaScript APIs, and you have a video call running. But the moment something breaks — a user behind a corporate firewall, audio cutting out at exactly 200ms, ICE candidates failing silently — you realize you've been flying blind.

We'll go deep on every layer: the signaling dance, ICE and NAT traversal, the DTLS handshake, SRTP, RTP packet flow, codec negotiation, and what actually happens when you press "send" on a voice packet. If you're comfortable with TCP/UDP and HTTP, you have everything you need to follow along.

What is WebRTC?

WebRTC is not a single protocol. It's a stack of protocols assembled to solve a very specific problem: how do you send real-time audio and video directly between two peers, through the hostile terrain of NATs, firewalls, and asymmetric networks, with encryption enforced by default.

WebRTC Concepts

1 ) Signaling & SDP (Session Description Protocol)

Before two peers can talk directly, they need to exchange metadata about themselves, what codecs they support, their IP addresses, etc. This exchange happens over a signaling channel, which WebRTC deliberately leaves up to you (WebSocket, HTTP, etc.). The metadata itself is formatted as SDP (Session Description Protocol), a text blob describing the session.

2 ) ICE, STUN & TURN Servers

Once SDP is exchanged, peers need to find a working network path to each other. This is ICE (Interactive Connectivity Establishment). It gathers candidates' possible ways to reach a peer, and tests them in priority order.

  • STUN helps a peer discover its public IP (since most devices sit behind NAT).

  • TURN is a relay fallback when a direct connection is impossible (e.g., symmetric NAT, firewalls).

3 ) RTCPeerConnection

This is the central WebRTC API. It orchestrates the entire lifecycle — signaling, ICE, DTLS handshake, codec negotiation, and finally the actual media/data flow. Think of it as the engine under the hood.

4 ) Media Streams & Tracks

WebRTC uses a layered model for media. A MediaStream is a container of MediaStreamTrack objects — each track is a single audio or video feed. Tracks are added to the RTCPeerConnection via senders and received via receivers.

5 ) Data Channels

RTCDataChannel lets you send arbitrary data (text, binary, files) directly peer-to-peer, without routing through a server. It's built on SCTP over DTLS, giving you both reliability and encryption. You can configure each channel independently, reliable ordered delivery for chat, unreliable unordered for real-time game state

Connection Process

Phase 1 - Signaling

Before any direct communication, both peers must reach each other through your signaling server. Peer A creates an SDP offer (describing its capabilities), sends it through the server, and Peer B responds with an answer.

Phase 2 - ICE gathering

While signaling happens, both peers simultaneously start gathering ICE candidates for every possible network address that could be used to reach them. This includes local IPs, public IPs discovered via a STUN server, and relay addresses from a TURN server. Candidates are sent to the other peer through the signaling channel as they're discovered (trickle ICE).

Phase 3 - ICE connectivity checks

With a list of candidates from each side, ICE now forms candidate pairs (every combination of A's candidates × B's candidates) and tests each pair with STUN binding requests. The highest-priority pair that succeeds wins. Direct P2P is always preferred; TURN relay is the last resort

Phase 4 - DTLS handshake

Once a working network path is found, peers perform a DTLS handshake (Datagram TLS — like TLS but for UDP). This authenticates each peer using the fingerprint they exchanged in the SDP, and produces the encryption keys used for the media session.

Phase 5 - Media & data flow

With DTLS complete, media flows as SRTP (encrypted RTP) and data channels flow as SCTP over DTLS — all traveling over the same UDP path that ICE selected. Both peers are now fully connected.

Why WebRTC Is the Right Foundation for Real-Time Communication

Finally, there is the matter of where web technology is heading. The browser is increasingly the universal application runtime, and the expectations users bring to web applications have risen accordingly. Laggy video, buffering audio, and clunky plugin-based screen sharing are no longer acceptable. WebRTC is the infrastructure that lets web applications meet those expectations natively — without compromise, without external dependencies, and without asking users to trust anything beyond the browser they already have open.

Understanding WebRTC in depth — the offer/answer flow, the ICE machinery, the DTLS handshake — is not just academic. It is what separates applications that work reliably in the real world, across NATs, firewalls, and mobile networks, from those that work only in a controlled demo environment. The complexity is real, but so is the payoff: a communication layer that is fast, secure, open, and already in the hands of billions of people.