IP Telephony Cookbook by Saverio Niccolini, Jorg Ott, et al - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Media Streams

Figure 2.10 A sample H.323 call setup scenario

The called party explicitly confirms with its gatekeeper that it is allowed to accept the call (5, 6) and, if so, alerts the recipient of the call, returns an alerting indication and (once the receiving user picks up the call) eventually an indication of successful connection setup back to the calling party (7, 8). In (parallel to) this exchange, capability negotiation and media stream configuration take place.When the setup has completed, both parties start sending media streams directly to each other.

P.28

[IP Telephony Cookbook] / Technological Background

{ 2.2.1.8 Additional (call) services

It is well known from our daily interaction with PBXs that telephony service comprises far more than just call setup and teardown: n-way conferencing and various supplementary services (such as call transfer, call waiting, etc.) are available. Similar features, at least the more commonly known and used ones, need to be provided by IP Telephony systems as well in order to be accepted by customers. Additional call services in H.323 can be grouped into three categories:

- Conferencing

H.323 inherently supports multipoint tightly-coupled conferencing, i.e., conferences with access control, optional support for conference chairs, and close synchronisation of conference state among all participants from the outset, through the concept of a Multipoint Controller and an optional Multipoint Processor.While control is centralised in the MC, in theory, data exchange may be either via IP multicast, multi-unicast (i.e., peer-wise fan-out between endpoints without MP), or through an MP. (There seems to be practically no H.323 equipment supporting media multicast.) The distribution mode may be selected per media and per endpoint peer and is controlled by the MC;

- Broadcast conferencing;

H.323 also provides an interface to support large loosely-coupled conferences as are frequently used in the Mbone to multicast seminars, events, etc. In this case, the MC defines a session description (using the Session Description Protocol, SDP, see below) for the H.323 media sessions (which have to operate using multicast) and announces this description by some means (e.g., the Session Announcement Protocol, SAP). Details are defined in ITU-T H.332.

- Supplementary services

H.323 provides a variety of supplementary services with additional ones continuously being defined.While some services can be accomplished using the basic H.323 specifications, the H.450.x Recommendations defines a framework (derived from QSIG, the ECMA/ISO/ETSI standard for supplementary service signalling in PBXs) and a number of services (call transfer, call diversion, call hold, call park & pickup, call waiting, message waiting indication and call completion).

Further extensions to supplementary services and other functional enhancements are on the way.

In particular, an HTTP-based extension framework is being defined at the time of writing to enable rapid introduction of new services without the need for standardisation.

{ 2.2.1.9 H.235 Security

The H.235 recommendation defines elements of security for H.323:

- Authentication

Authentication can be achieved by using a shared secret (password) or digital signatures.The RAS messages include a token that was generated using either the shared secret or the signature. A receiving entity authenticates the sender by comparing the received token with a self-generated token;

- Message Integrity

Integrity is achieved by generating password-based checks on the message; Privacy Mechanisms are provided to setup encryption on the media streams.They must be used in conjunction with the H.245 protocol and employ DES,Triple DES or RC2.The use of SRTP is not supported yet (in H.235v2).

P.29

[IP Telephony Cookbook] / Technological Background

These mechanisms are grouped into the Security Profiles, where the Baseline Security Profile provides authentication and message integrity, making it suitable for subscription-based environments and the Voice Encryption Profile that provides confidential end-to-end media channels.

{ 2.2.1.10 Protocol Profiles

H.323 has its origin, as mentioned before, in the area of multimedia conferencing.This implies that a vast number of options are available, which are not necessary for simply providing telephony services.The TIPHON project of the European Telecommunication Standards Institute (ETSI) has defined a telephony profile for H.323 that specifies which combination of options should be implemented.

Similarly, H.323 contains a security framework (H.235) that describes a collection of algorithms and protocol mechanisms but lacks, because of international political constraints, a precise specification of a mandatory baseline.This is accounted for by the ETSI TIPHON security profile: this specification fills in the gaps and provides the foundation for inter-operable implementations.

In summary, it can be said that the H.323 family of standards provides a mature basis for commercial products in the field of IP Telephony.While the details of the protocol are often dominated by their legacy from various earlier ITU protocols, there is an active effort to profile and simplify the protocol to reduce the complexity.

{ 2.2.2 SIP

{ 2.2.2.1 The purpose of SIP

SIP stands for Session Initiation Protocol. It is an application-layer control protocol that has been developed and designed within the IETF.The protocol has been designed with easy implementation, good scalability, and flexibility in mind.

The specification is available in form of several RFCs.The most important one is RFC3261, which contains the core protocol specification.The protocol is used for creating, modifying and terminating sessions with one or more participants. By sessions, we understand a set of senders and receivers that communicate and the state kept in those senders and receivers during the communication. Examples of a session can include Internet telephone calls, distribution of multimedia, multimedia conferences, distributed computer games, etc.

SIP is not the only protocol that the communicating devices will need. It is not meant to be a general purpose protocol.The purpose of SIP is just to make the communication possible.The communication itself must be achieved by other means (and possibly another protocol).Two protocols that are most often used along with SIP are RTP and SDP.The RTP protocol is used to carry the real-time multimedia data (including audio, video and text).The protocol makes it possible to encode and split the data into packets and transport these packets over the Internet.

Another important protocol is SDP, Session Description Protocol, which is used to describe and P.30

[IP Telephony Cookbook] / Technological Background

encode capabilities of session participants. Such a description is then used to negotiate the characteristics of the session so that all of the devices can participate, including, for example, negotiation of codecs used to encode media so all the participants will be able to decode it, negotiation of transport protocol used and so on.

SIP has been designed in conformance with the Internet model. It is an end-to-end

-oriented signalling protocol which means that all the logic is stored in end-devices (except routing of SIP messages). State is also stored only in end-devices.There is no single point of failure and networks designed this way scale well.The price we have to pay for the

‘distributiveness’ and scalability is higher message overhead, caused by the messages being sent end-to-end.

It is worth mentioning that the end-to-end concept of SIP is a significant divergence from a regular PSTN (Public Switched Telephone Network) where all the state and logic is stored in the network and the end-devices (telephones) are very primitive.The aim of SIP is to provide the same functionality that the traditional PSTNs have, but the end-to-end design makes SIP

networks much more powerful and open to the implementation of new services that can hardly be implemented in the traditional PSTNs.

SIP is based on HTTP protocol.The HTTP protocol inherited format of message headers from RFC822. HTTP and is probably the most successful and widely used protocol in the Internet.

SIP tries to combine the best of both. In fact, HTTP can be classified as a signalling protocol too, because user-agents use the protocol to tell an HTTP server which documents they are interested in. SIP is used to carry the description of session parameters.The description is encoded into a document using SDP. Both protocols (HTTP and SIP) have inherited the encoding of message headers from RFC822.The encoding has proven to be robust and flexible over the years.

2.2.2.1.1 SIP URI

SIP entities are identified using SIP URI (Uniform Resource Identifier). A SIP URI has the form of sip:username@domain, or sip:joe@company.com. SIP URI consists of a username part and a domain name part, delimited by the @ (at) character. SIP URIs are similar to e-mail addresses and it is, for instance, possible to use the same URI for e-mail and SIP communication. Such URIs are easy to remember.

{ 2.2.2.2 SIP network elements

Although, in the simplest configuration, it is possible to use just two user agents that send SIP

messages directly to each other, a typical SIP network will contain more than one type of SIP

element. Basic SIP elements are user agents, proxies, registrars and redirect servers.They are described briefly in this section.

Note that the elements, as presented in this section, are often only logical entities. It is often profitable to co-locate them, for instance, to increase the speed of processing, but that depends on the particular implementation and configuration.

P.31

[IP Telephony Cookbook] / Technological Background

2.2.2.2.1. User agents

Internet endpoints that use SIP to find eachother and to negotiate a session’s characteristics are called user agents. User agents usually, but not necessarily, reside on a user's computer in form of an application.This is currently the most widely-used approach, but user agents can be also cellular phones, PSTN gateways, PDAs, automated IVR systems and so on.

User agents are often referred to as User Agent Server (UAS) and User Agent Client (UAC). UAS

and UAC are logical entities and each user agent contains a UAC and UAS. UAC is the part of the user agent that sends requests and receives responses. UAS is the part of the user agent that receives requests and sends responses.

Because a user agent contains both UAC and UAS, user agents behave like a UAC or a UAS. For instance, a calling party’s user agent behaves like UAC when it sends an INVITE request and receives responses to the request. A called party’s user agent behaves like a UAS when it receives the INVITE and sends responses.

But this situation changes when the called party decides to send a BYE and terminate the session.

In this case the called party's user agent (sending BYE) behaves like UAC and the calling party's user agent behaves like UAS.

Called Party

UAC

Calling Party

Stateful Forking Proxy

UAS

INVITE

UAC

INVITE

UAC

UAS

Called Party

UAS

UAC

INVITE

UAS

BYE