UNIT

2

Sound/Audio Formatted on September 5, 2000

Objectives

Contents

 To understand how computers process sound

 To

understand synthesize sound

how

1 The Nature of Sound

2

2 Computer Representation of Sound

7

computers

 To understand the differences between two major kinds of audio, namely digitised sound and MIDI music

3 Computer Music — MIDI

15

4 Summary — MIDI versus digital audio

26

5 Exercises

27

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 1

1 The Nature of Sound Sound is a physical phenomenon produced by the vibration of matter and transmitted as waves. However, the perception of sound by human beings is a very complex process. It involves three systems:

 the source which emits sound;  the medium through which the sound propagates;  the detector which receives and interprets the sound. Amplitude

Sounds we heard everyday are very complex. Every sound is comprised of waves of many different frequencies and shapes. But the simplest sound we can hear is a sine wave.

Time

Period

Sound waves can be characterised by the following attributes: Period Frequency Amplitude Bandwidth Pitch Loudness Dynamic Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 2

1.1 Pitch and Frequency Period is the interval at which a periodic signal repeats regularly. Pitch is a perception of sound by human beings It measures how ‘high’ is the sound as it is perceived by a listener. Frequency measures a physical property of a wave. It is the reciprocal value of period f = P1 . The unit is Herts (Hz) or kiloHertz (kHz). Musical instruments are tuned to produce a set of fixed pitches.

Department of Computer Science

Infra-sound 0 – 20 Hz Human hearing range 20 – 20 kHz Ultrasound 20 kHz – 1 GHz Hypersound 1 GHz – 10 THz Note Ratio Frequencies C 1:1 264 D 9:8 297 E 5:4 330 F 4:3 352 G 3:2 396 A 5:3 440 B 15:8 495 C 2:1 528

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 3

1.2 Loudness and Amplitude The other important perceptual quality is loudness or volume. Amplitude is the measure of sound levels. For a digital sound, amplitude is the sample value. The reason that sounds have different loudness is that they carry different amount of power. The unit of power is watt. The intensity of sound is the amount of power transmitted through an area of 1m2 oriented perpendicular to the propagation direction of the sound. If the intensity of a sound is 1watt=m 2, we may start feel the sound. The ear may be damaged.

This is known as the threshold of feeling. If the intensity is 10 12watt=m2, we may just be able to hear it. This is know as the threshold of hearing. The relative intensity of two different sounds is measured using the unit Bel or more commonly deciBel (dB). It is defined by relative intensity in dB

= 10 log

I2 I1

Very often, we will compare a sound with the threshold of hearing.

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 4

Typical sound levels generated by various sources

Typical sound levels in music

160 dB 130 dB 100 dB 70 dB 50 dB 30 dB 20 dB

Jet engine Large orchestra at fortissimo Car on highway Voice conversation Quiet residential areas Very soft whisper Sound studio

Intensity Sound Level Loudness (watt=m2) dB 1 120 Threshold of feeling 3 10 90 fff 4 10 80 ff 5 10 70 f 6 10 60 mf 7 10 50 p 8 10 40 pp 9 10 30 ppp 12 10 0 Threshold of hearing

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 5

1.3 Dynamic and Bandwidth

 Dynamic range means the change in sound levels. For example, a large orchestra can reach 130dB at its climax and drop to as low as 30dB at its softest, giving a range of 100dB.

 Bandwidth is the range of frequencies a device can produce or a human can hear. FM radio AM radio CD player Sound Blaster 16 sound card Inexpensive microphone Telephone Children’s ears Older ears Male voice Female voice

Department of Computer Science

50Hz – 15kHz 80Hz – 5kHz 20Hz – 20kHz 30Hz – 20kHz 80Hz – 12kHz 300Hz – 3kHz 20Hz – 20kHz 50Hz – 10kHz 120Hz – 7kHz 200Hz – 9kHz

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 6

2 Computer Representation of Sound

 Sound waves are continuous while computers are good at handling discrete numbers.  In order to store a sound wave in a computer, samples of the wave are taken.  Each sample is represented by a number, the ‘code’.  This process is known as digitisation.  This method of digitising sound is know as pulse code modulation (PCM). Refer to Unit 1 for more information on digitisation.

 According to Nyquist sampling theorem, in order to capture all audible frequency components of a sound, i.e., up to 20kH z , we need to set the sampling to at least twice of this. This is why one of the most popular sampling rate for high quality sound is 4410H z .

 Another aspect we need to consider is the resolution, i.e., the number of bits used to represent a sample. Often, 16 bits are used for each sample in high quality sound. This gives the SNR of 96dB .

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 7

2.1 Quality versus File Size The size of a digital recording depends on the sampling rate, resolution and number of channels. S

=

R

(

b=8)

  C

D

Higher sampling rate, higher resolution gives higher quality but bigger file size.

S R b C D

file size sampling rate resolution channels recording duration

bytes samples per second bits 1 - mono, 2 - stereo seconds

For example, if we record 10 seconds of stereo music at 44.1kHz, 16 bits, the size will be: S

= 44100

 (16 8)  2  10 =

= 1; 764; 000bytes = 1722:7Kbytes = 1:68Mbytes

Note:

1Kbytes = 1024bytes 1Mbytes = 1024Kbytes

High quality sound files are very big, however, the file size can be reduced by compression.

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 8

File size for some common sampling rates and resolutions Sampling Rate 44.1KHz 44.1KHz

Stereo Resolution /Mono 16-bit Stereo 16-bit Mono

44.1KHz

8-bit

Stereo

44.1KHz

8-bit

Mono

22.05KHz 16-bit

Stereo

22.05KHz 16-bit

Mono

22.05KHz 8-bit

Stereo

22.05KHz 8-bit

Mono

11KHz

8-bit

Stereo

11KHz

8-bit

Mono

5.5KHz 5.5KHz

8-bit 8-bit

Stereo Mono

Size for for 1 Min. Comments 10.5MB CD-quality recording 5.25MB A good trade-off for high-quality recordings of mono sources such as voice-overs 5.25MB Achieves highest playback quality on low-end devices such as most of the sound cards 2.6MB An appropriate trade-off for recording a mono source 5.25MB Darker sounding than CD-quality recording because of the lower sampling rate 2.5MB Not a bad choice for speech, but better to trade some fidelity for a lot of disk space by dropping down to 8-bit 2.6MB A very popular choice for reasonable stereo recording where full bandwidth playback is not possible 1.3MB A thinner sound than the choice just above, but very usable 1.3MB At this low a sampling rate, there are few advantages to using stereo 650K In practice, probably as low as you can go and still get usable results 650K Stereo not effective 325K About as good as a bad telephone connection

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 9

2.2 Audio File Formats The most commonly used digital sound format in Windows systems is .wav files.

 Sound is stored in .wav as digital samples known as Pulse Code Modulation(PCM).  Each .wav file has a header containing information of the file. 

type of format, e.g., PCM or other modulations



size of the data



number of channels



samples per second



bytes per sample

 There is usually no compression in .wav files. Other format may use different compression technique to reduce file size.

 .vox use Adaptive Delta Pulse Code Modulation (ADPCM).  .mp3 MPEG-1 layer 3 audio.  RealAudio file is a proprietary format. Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 10

Some common audio files formats Extension aif aifc AIFF aiff au mov mpe mpeg mpg mp3 qt ra,ram snd vox wav

MIME Type Audio/x-aiff Audio/x-aiff Audio/x-aiff Audio/x-aiff Audio/basic Video/QuickTime Video/mpeg Video/mpeg Video/mpeg Audio/x-mpeg Video/QuickTime Audio/x-pn-realaudio Audio/basic Audio/ Audio/x-wav

Platform Mac, SGI Mac, SGI Mac, SGI Mac, SGI Sun, NeXT Mac, Win All All All All Mac, Win All Sun, NeXT All Win

Department of Computer Science

Use Audio Audio (compressed) Audio Audio ULAW audio data QuickTime video MPEG video MPEG video MPEG video MPEG audio QuickTime video RealAudio Sound ULAW Audio Data VoxWare Voice WAV Audio

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 11

2.3 Audio Hardware

 Recording and Digitising sound: 



An analog-to-digital converter(ADC) converts the analog sound signal into digital samples. A digital signal processor(DSP) processes the sample, e.g. filtering, modulation, compression, and so on.

 Play back sound: 



A digital signal processor processes the sample, e.g. decompression, demodulation, and so on An digital-to-analog converter(DAC) converts the digital samples into sound signal

 All these hardware devices are integrated

 Different sound card have different capability of processing digital sounds. When buying a sound card, you should look at: 

maximum sampling rate



stereo or mono



duplex or simplex

ADC Digital DSP

Sound

DAC

into a few chips on a sound card

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 12

2.4 Audio Software

 Windows

device driver — controls the hardware

device. Many popular sound cards are Plus and Play. Windows has drivers for them and can recognise them automatically. For cards that Windows does not have drivers, you need to get the driver from the manufacturer and install it with the card.

 If you do not hear sound, you should check the settings, such as interrupt, DMA channels, and so on.

 Device manager — the user interface to the hardware for configuring the devices. 

You can choose which audio device you want to use



You can set the audio volume

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 13

Mixer — its functions are:

 to combine sound from different sources  to adjust the play back volume of sound sources  to adjust the recording volume of sound sources Recording — Windows has a simple Sound Recorder. Editing — The Windows Sound Recorder has a limiting editing function, such as changing volume and speed, deleting part of the sound. There are many freeware and shareware programs for sound recording, editing and processing.

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 14

3 Computer Music | MIDI Sound waves, whether occurred natural or man-made, are often very complex, i.e., they consist of many frequencies. Digital sound is relatively straight forward to record complex sound. However, it is quite difficult to generate (or synthesize) complex sound. There is a better way to generate high quality music. This is known as MIDI — Musical Instrument Digital Interface. It is a communication standard developed in the early 1980s for electronic instruments and computers. It specifies the hardware connection between equipments as well as the format in which the data are transfered between the equipments. Common MIDI devices include electronic music synthesisers, modules, and MIDI devices in common sound cards. General MIDI is a standard specified by MIDI Manufacturers Association. To be GM compatible, a sound generating device must meet the General MIDI system level 1 performance requirement.

    

minimum of 24 fully voices 16 channels, percussion on channel 10 minimum 16 simultaneous and different timbre instruments minimum 128 preset instruments Support certain controllers

Department of Computer Science

This sign indicated that the device is a general MIDI device. COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 15

3.1 MIDI Hardware An electronic musical instrument or a computer which has MIDI interface should has one or more MIDI ports. The MIDI ports on musical instruments are usually labelled with: IN — for receiving MIDI data; OUT — for outputting MIDI data that are generated by the instrument; THRU — for passing MIDI data to the next instrument. MIDI devices can be daisy-chained together. MIDI device

IN OUT

OUT IN

THRU

MIDI device

Department of Computer Science

IN THRU MIDI device

IN THRU MIDI device

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 16

3.2 MIDI Data Unlike digital sound, MIDI data does not encode individual samples. MIDI data encode musical events and commands to control instruments. MIDI data are grouped into MIDI messages. Each MIDI message represents a musical event, e.g., pressing a key, setting a switch or adjusting foot pedals. A sequence of MIDI messages is grouped into a track. An instrument or a computer satisfies both the hardware interface and the data format is known as a MIDI device.

Department of Computer Science

Key C4 ON Key G3 OFF Volume 105

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 17

3.3 MIDI Channels and Modes MIDI devices communicate with each other through channels. The MIDI standard specifies each MIDI connection has 16 channels. Each instrument can be mapped to a single channel (omni Off), or it can use all 16 channels (Omni On). Some instruments are capable of playing more than one note at the same time, e.g., organs and piano. This is known as polyphony. Other instruments, such as flute, is monophony since they can only play one note at a time. Each MIDI device must be set to one of the modes for receiving MIDI data:

Department of Computer Science

Omni On/Poly Omni On/Mono Omni Off/Poly Omni Off/Mono

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 18

3.4 Instrument Patch Each MIDI device is usually capable of producing sound resembling several real instruments and/or noise effects (e.g., telephone, aircraft). Each instrument or noise effect is known as a patch, or preset. The general MIDI standard specifies 128 patches(ranges from 0 to 127). ID 0 7 10 19 40 48 124 125 126

Sound Acoustic grand piano Clarinet Music box Church organ Violin String ensemble I Telephone ring Helicopter Applause

ID 35 45 76 80

Department of Computer Science

Sound Acoustic bass drum Low tom High wood block Mute triangle

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 19

3.5 MIDI les When using computers to play MIDI music, the MIDI data are often stored in MIDI files. Each MIDI files contains a number of chunks. There are two types of chunks:

 Header chunk — contains information about the entire file: the type of MIDI file, number of tracks and the timing.

 Track chunk — the actual data of MIDI track. Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

Chunk 6

There three types of MIDI file: 0 single multichannel track 1 one or more simultaneous track of a sequence

Header chunk

Track chunk

2 one or more sequentially independent single-track patterns

Department of Computer Science

MIDI event

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 20

Tracks, channels and patches

 Multiple tracks can be played at the same time.  Each track can be assigned to a different channel.  Each channel can accept more than one track.  Each channel is assigned a patch, therefore generates sound of a particular instrument Track 1

Chn 2

Track 2

Chn 3

Track 3

Chn 4

Track 4

Chn10

Device 1 Chn 2<--> Piano

Device 2 Chn 10

Department of Computer Science

Device 3 Omni On Chn 2<--> Vialin Chn3 <--> Guitar Chn 4 <--> String ensemble

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 21

3.6 How MIDI Sounds Are Synthesized A simplistic view is that:

 the MIDI device stores the characteristics of sounds produced by different sound sources;  the MIDI messages tell the device which kind of sound, at which pitch is to be generated, how long the sound is played and other attributes the note should have. There are two ways of synthesizing sounds:

 FM Synthesis (Frequency Modulation) — Using one sine wave to modulate another sine wave, thus generating a new wave which is rich in timbre. It consists of the two original waves, their sum and difference and harmonics. The drawbacks of FM synthesis are: the generated sound is not real; there is no exact formula for generating a particular sound.

 Wave-table synthesis — It stores representative digital sound samples. It manipulates these sample, e.g., by changing the pitch, to create the complete range of notes.

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 22

MIDI Sound Attributes The shape of the amplitude envelop has great influence on the resulting character of sound. There are two different types of envelop:

 Diminishing sound — gradually die out;  Continuing sound — sustain until turned off. On Key

Amplitude

Amplitude

Off

Time

Deminishing sound

Department of Computer Science

Time

Continuing sound

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 23

The Amplitude Envelop

 Delay — the time between when a key is played and when the attack phase begins

 Attack

— the time from no sound to maximum amplitude level beform starting the decay phase

 Decay — the time it takes the envelop to go

Amplitude

 Hold — the time envelop will stay at the peak from the peak level to the sustain level

 Sustain

— the level at which the envelop remains as long as a key is held down

 Release — the time is takes for the sound to fade to nothing

Attack Delay

Department of Computer Science

Decay Hold

Sustain

Release

Time

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 24

3.7 MIDI software MIDI player for playing MIDI music. This includes:

 Windows media player can play MIDI files  Player come with sound card — Creative Midi player  Freeware and shareware players and plug-ins— Midigate, Yamaha Midplug, etc. MIDI sequencer for recording, editing and playing MIDI

 Cakewalk Express, Home Studio, Professional  Cubasis  Encore  Voyetra MIDI Orchestrator Plus Configuration — Like audio devices, MIDI devices require a driver. Select and configure MIDI devices from the control panel.

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 25

4 Summary | MIDI versus digital audio Digital Audio

 Digital representation of physical sound waves

 File size is large if without compression  Quality is in proportion to file size  More software available  Play back quality less dependent on the sound sources

 Can record and play back any sound including speech

Department of Computer Science

MIDI

 Abstract representation of musical sounds and sound effects

 MIDI files are much more compact  File size is independent to the quality  Much better sound if the sound source is of high quality

 Need some music theory  Cannot generate speech

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 26

5 Exercises 1. Your hard disk has 256Mbytes of free space. You are going to record a speech with a sampling rate of 11KHz, 8-bit resolution and a single channel. What is the length of the recording that can be stored in the hard disk? (Answer in seconds) 2. A multimedia presentation has 30 minutes of CD-quality digital audio in .wav files. What is the storage required for these files? 3. In a teleconferencing system, a network connection having bandwidth of 100Kbytes/sec is allocated to duplex audio link. What is the maximum sampling rate and resolution in which uncompressed audio data can be transmitted in real-time? 4. You are developing a network voice communication program. It uses the Internet to connect two remote users and allows them to talk to each other in real time. What is the most appropriate sampling rate for recording their voice?

keyboard is a general MIDI device which is set to ‘Omni On’ mode, Synthesiser 1 is a drum machine which accepts signals on the drum channel only, and Synthesiser 2 is a guitar accepting signals from all channels but can only synthesise guitar sound. A MIDI sequencer is playing a MIDI file on the computer. The table below lists tracks with their patch and channel settings. What kind of sound will be heard from each device? Track Patch Channel Track Patch Channel 1 2 3 2 8 4 3 28 5 4 49 6 5 57 7 6 73 2

IN OUT THRU

Computer

Music Keyboard

IN

THRU

Synthesiser 1

IN

THRU

Synthesiser 2

5. In the configuration shown below, the music

Department of Computer Science

COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 27

5.1 Solution to Exercises 1. The size of one second of recording is 11,050 bytes. the length of recording that can be stored in 256MBytes is (256

 1024  1024)

=11;

025 = 24; 347(seconds)

2. CD-quality digital audio means 44,100Hz Sampling rate, 16-bit resolution and stereo. 44100

 2  2  (30  60)

= 317; 510; 000 =

bytes 302:8 MBytes

3. Since the link is duplex, we will allocate half the bandwidth to each direction, i.e., 50Kbytes/sec. Using the formula below S

=

R

(

b=8)

  C

D;

and assuming we record in mono, we have 50; 000 = R  (b=8). Since it is a teleconferencing application, the sound will mostly be speech. We can use lower sampling rate and higher resolution. Therefore, we can use 22.05KHz sampling rate and 16-bit resolution. The most important limit is that the data can not be larger than the bandwidth, otherwise, we will not be able to transmit them. Department of Computer Science

4. Because the data we want to handle are voice, a sampling rate of 11KHz and mono will be enough. This creates a data rate of 11K/sec. If the users are connected to the Internet via a LAN, this data rate should be handled by the system adequately. However, if they are connected via modems, the data rate will be too high. Compression technique has to be used. 5. Since the music keyboard can accept signal from all channels, the general MIDI patch map, we will hear the following sounds: Track Patch Channel Sound 1 2 3 Bright Acoustic Piano 2 8 4 Clavinet Chromatic 3 28 5 Electric Guitar 4 49 6 String Ensemble 1 5 57 7 Trumpet 6 73 2 Piccolo Synthesiser 2 will produce no sound because no signal is sent to Channel 10. Synthesiser 3 will produce sound for signals from all channels but they all sound like guitar. COMP3600/SCI2600 Multimedia Systems 2. Sound/Audio (200009) Slide: 28

unit2.pdf

Human hearing range 20 – 20 kHz. Ultrasound 20 kHz – 1 GHz. Hypersound 1 GHz – 10 THz. Musical instruments are tuned to produce a set of. fixed pitches.

183KB Sizes 63 Downloads 161 Views

Recommend Documents

No documents