eScience Lectures Notes : The role of Digitisation
Slide 1 : 1/28 : The role of Digitisation (index.en.html)
COMP1710 Tools for New Media and Web
The role of Digitisation
then 'n'ext or 'b'ack
Slide 2 : ToC : The role of Digitisation (tableOfContent.en.html)
Table of Contents (28 slides) for the presentation :
The role of Digitisation
Slide 3 : 3/28 : New Media and Web (intro.en.html)
In this lecture: The role of Digitisation
Principles of Digitisation
The need for compression : the bandwidth issue
The ways to compress
Standards and file formats
To read more about this subject :
"The Web Wizards's guide to Multimedia" by James G. Lengel
Slide 4 : 4/28 : Communication by multimedia (communication.en.html)
Communication by multimedia
A computer connected to the internet allows us to
Play and record human voices
Display High quality full colour images
Display text so that it is easily searchable and readable
Engage viewers in interactive operation
provide data to any of the human senses (though not all are equally available
Multimedia is nothing but the processing and
presentation of information in a more structured and understandable
manner using more than one medium such as text, graphics, animation,
audio and video. Thus multimedia products can be an academic
presentation, game or corporate presentation, information kiosk,
fashion-designing etc. Multimedia systems are those computer platforms
and software tools that support the interactive uses of text,
graphics, animation, audio, or motion video. In other words, a
computer capable of handling text, graphics, audio, animation and
video is called a multimedia computer. If the sequence and timing of
these media elements can be controlled by the user, then one can call
it Interactive Multimedia.
Slide 5 : 5/28 : 5, 6, 7 ... senses (senses.en.html)
Five senses (but there are 7 or more)
The ability to hear; the auditory faculty; SYN. audition, auditory sense,
sense of hearing, auditory modality.
|The ability to see; the faculty of vision; SYN. vision, visual sense,
surface / temperature
|The faculty of touch; SYN. sense of touch, skin senses, touch modality,
|The faculty of smell; SYN. sense of smell, olfaction, olfactory modality.
|The faculty of taste; SYN. gustation, sense of taste, gustatory modality.
position, movement, muscular tensions
|The perception of body position and movement and muscular tension etc;
SYN: kinaesthesia, feeling of movement
balance, acceleration, position, location, orientation, movement of
|The ability to sense the position and location and orientation and movement
of the body and its parts.
Slide 6 : 6/28 : Multimedia before Digitisation (beforeDigitization.en.html)
Multimedia before Digitisation
Each form of human communication had its own technology and its own channel
They evolved separately due to
into different industries:
Film (actors, directors, writers, distributors, theaters)
Television (as above + news, shows etc)
Music recording and distribution industry (singers, musicians, sound engineers,
Book writing and publishing
Newspaper writing and publishing
Perfumes, Deoderants, Air fresheners
Slide 7 : 7/28 : Specialized Systems (specialazedDevices.en.html)
E.g. at home : a long list of different media appliances
Radio, telephone, tape player, television, VCR, CD player, slide projector,
newspapers, books ...
and outside : cinema, theater, concert halls, restaurants
Slide 8 : 8/28 : Multimedia before Digitisation (2) (communication2.en.html)
Multimedia before Digitisation (2)
Old media can not offer the same range of forms as the internet
Other Mass media are essentially one-way
Main diffusion principle : broadcasting (one to many)
Older forms of human communication are more interactive than
most modern media.
One issue : the scale
Slide 9 : 9/28 : Multimedia after Digitisation (2) (communication3.en.html)
Multimedia after Digitisation
Slide 10 : 10/28 : Digital Revolution (digitalRevolution.en.html)
When they are digitized, all the different type of media can be saved in
a digital computer file
A single medium stores text, voice, video, images and music and the computer
can play all of them back, with high quality, at the same time
N.B. : same medium, but still different usages !
Slide 11 : 11/28 : Two steps : From the physical world (twoSteps.en.html)
From the physical world ...
Light Intensity and Wavelength
How do we represent an analog signal in a computer ?
Basic problem is that we need to represent a function, which mathematically
can represent an infinite amount of information, with a finite number of symbols.
Slide 12 : 12/28 : Specialized Systems (twoSteps2.en.html)
Two steps : ... to the digital world
Digitisation (Sampling) : Discretisation in space or time
Sampling pattern : image space is tessellated into discrete, local, compact,
regions (regular rectangular planar grid)
Sampling process : Point in neighbourhood, Average over neighbourhood
Nyquist Criterion : Sample take at least twice highest frequency contained
in the signal of interest
Sound : Rate (44kHz) and Size (8 or 16 bits)
Image : Resolution (300 dps) and number of colours
Quantisation : Discretisation in Value
The function can take on only finitely many values
E.g. for images 3 general domains :
Large: 32-64 bits per channel : a billion or more intensity values, "real
number" representation : computational purposes
Full Web colors : 16*16 * 16*16 * 16*16 = 2^8 * 2^8 * 2^8 = 2^24
Typical JPEG images provide 24 bits per pixel, but that's 8-bits per channel
(red, green, and blue)
Moderate: 8-12-16 bits per pixel : 256-4096-65536 intensity values, "integer"
representation : Good enough for almost all display purposes 8 : 256 colors
/ 12 = 3 * 4 : 16*16*16
Small: 1 or 2 bits : black and white
for more on Bit Depth
Slide 13 : 13/28 : Illustration of the Digitisation (digitisationEx.en.html)
Illustration of the Digitisation :
From 600x400, 32 bits image (72 dpi : dot per inch)
(240 000 pixels)
To 180x80, 32 bits image
and filling the same space (14.4dpi):
Slide 14 : 14/28 : Illustration of the Quantization : (quantisationEx.en.html)
Illustration of the Quantisation :
From 600x400, 32 bits image (72 dpi : dot per inch)
To 600x400, 4 bits image (16 colors)
To 600x400, 2 bits image (4 colors)
Slide 15 : 15/28 : Network Bandwidth (bandwidth.en.html)
The rate at which the network can deliver data to the destination point
The amount of data that can be transmitted over a network
in a fixed amount of time. Bandwidth is the fundamental networking parameter,
and is usually measured in kilobits, megabits or gigabits per second (Kbps,
Mbps, or Gbps).
Rate of transfer
Available bandwidth determined by wire and hardware
You may have High-Bandwidth and bad (high) latency (eg. Satellite)
Slide 16 : 16/28 : The Role of Bandwidth (bandwidthRole.en.html)
The Role of Bandwidth
The size of this included image is 84kB
84 kB = 84 kilo Bytes = 84 * 1024 * 8 = 688 128 bits
1kB = 1024 B / 1 Bytes = 8 bits
The time to transfer the image = size / bandwidth
by a modem at 56 kbps = 56 000 bit per second
time = 688 128 / 56 000 = 12.28 secondes
on TransACT "broadband" : 688 128 / 512 000 = 1.3 s
Slide 17 : 17/28 : Bandwidth Levels (bandwidthLevels.en.html)
Type of Connection
what you get in 1 second
Or live streaming
||small email ~ 1.2 kB
||irc / text / telnet
||web graphic ~ 7 kB
|ISDN (Integrated Service Digital Network)
||2 web graphics ~ 15 kB
||visioconference one 2 one
|DSL / Cable Modem
||1 jpeg image 600x400 ~ 62 kB
||300kbps = very useful video (cable, ADSL)
|near future DSL / Cable Modem
||Document ~ 125 kB
||1500kbps, 2.2 Mbps= VHS video
||1 floppy disk ~ 1.25 MB
||6Mbps = PAL video
||2 MP3 songs ~ 12.25 MB
||20Mbps = compr. HDTV
||10m CD audio ~ 125 MB
||270Mbps = raw PAL video
||2 CDs ~ 1.25 GB
||1.5Gbps = raw HDTV
||2 DVDs ~ 12.5 GB
||1Tbps = 50,000 channels of compressed HDTV
NB. : Mbps = 1000 x 1000 bits per second, kbps = 1000 bps, Gbps = 1000 Mbps
-- minus overheads !
MB/s (Megabytes/s) : 1024x1024 bytes per second
The standard for carriers and networks is that Mbps is
1000x1000 bits per second (and Gigabit/s is 1000x1000x1000). That's also the
transport rate, not the payload rate - so you need to allow for overheads of
whatever protocols you are using. (e.g. tcp/ip/atm/sdh - you lose a lot of payload
bandwidth that way.)
Conversely, if somebody quotes MB/s (Megabytes/s) they do usually mean 1024x1024
bytes per second.
Back in the bad old days, a 1 Megabyte floppy was 1024x1000
Slide 18 : 18/28 : Network Bandwidth (2) (bandwidth2.en.html)
Network Bandwidth (2)
The current record is 6.4Tb/s
(when you are really close) and about
10 Gbps in real situation
for more on record bandwith : http://lsr.internet2.edu/
ACT 300000 * 256 kbps (Transact) = 73 Gbps
ACT 300000 * 2.2 Mbps = 644 Gbps
AARNET Backbone Canberra-Sydney, or Canberra-Melbourne : 18 Mbp
Bandwidth is often non symetric at the end of the network
No bandwidth is high or low its just
different to what you are used to.
Which depends on where you are
Its not the sustained bandwidth you are catering for
its the peaks!
Internet traffic is fractal, and the same issue appears on the server
Slide 19 : 19/28 : Data Compression : Why ? (compressionWhy.en.html)
Data Compression : Why ?
Let's take an image ..."The size of this included image is 84 kB"
But 600 x 400 x 32 bits (2^32 colours = 16 Millions colours) = 7680000 bits
= 1 MB !!! (937.5 kB)
No more 12 s but 2 minutes on a 56kbps modem !
84 kB is the compressed image
Compare your lab directory size and the zipped equivalent ... another type
Different type of compression uses different codec
: compressor / decompressor software routines
Slide 20 : 20/28 : Finding Redundancy (compressionRedundancy.en.html)
Data Compression : Finding Redundancy
Most types of computer files are fairly redundant -- they have the same information
listed over and over again.
file-compression programs list information once and then refer back to it
whenever it appears in the original program.
In John F. Kennedy's 1961 inaugural address :
"Ask not what your country can do for you -- ask what you can do for
17 words, made up of 61 letters, 16 spaces, one dash and one period : total
file size of 79 units.
"ask", "what", "your", "country",
"can", "do", "for", "you" appear twice
Slide 21 : 21/28 : Looking it Up (compressionLooItUp.en.html)
Data Compression : Looking it Up
Most compression programs use a variation of the LZ adaptive dictionary-based
algorithm to shrink files.
"LZ" refers to Lempel and Ziv, the algorithm's creators
"dictionary" refers to the method of cataloging pieces of data.
Our Dictionary :
The compressed sentence : from "Ask not what your country can
do for you; ask what you can do for your country" (79
char) to ...
"1 not 2 3 4 5 6 7 8; 1 2 8 5 6 7 3 4"
36 + 36 (dictionary 29+7) = 72
"Ask not what your country can do for you; 1 3 9 6 7
8 4 5" = 57
Slide 22 : 22/28 : Searching for Patterns (compressionPatterns.en.html)
Data Compression : Searching for Patterns
a compression program doesn't have any concept of separate words : it only
looks for patterns.
Pattern : combinaison of character that are repeated over the sentence
From simple pattern ( "ou" in "your" and "country"
) to more than one word ( "can do for you" )
The ability to rewrite the dictionary is the "adaptive" part of
LZ adaptive dictionary-based algorithm.
The way a program actually does this is fairly complicated,
as you can see by the discussions on Data-Compression.com.
No matter what specific method you use, this in-depth searching system lets
you compress the file much more efficiently than you could by just picking out
Using the patterns we picked out above, and adding "_" for spaces,
we come up with this larger dictionary:
Sentence 16 units + dictionary 40 units = 56 units!
Slide 23 : 23/28 : Data Compression : Why ? (compressionLose.en.html)
Data Compression : Lossy and Lossless
Lossless compression lets you recreate the original file exactly
LZ adaptive dictionary-based algorithm is a well known example
Breaking a file into a "smaller" form for transmission or storage
and then putting it back together on the other end so it can be used again.
works well ( good "file-reduction ratio" ) with text file, and programming
far less efficient with complex data like sound or bitmap pictures
Lossy compression eliminate "unnecessary" bits of information
E.g. : the sky in a picture is blue, but most of the pixels are with a different
The compression codec would chose an average blue and apply it to the pixels
not to far from that average value.
No way to get the lost information back after such an alteration
You are not supposed to notice the change
This sort of compression can't be use for anything that needs to be reproduced
Slide 24 : 24/28 : principles for Lossy compression (compressionLosy.en.html)
Different principles for Lossy compression
What is "unnecessary" ?
Averaging ( + repetition/pattern ) (clever forced digitisation)
Range Reduction : from 32 bits to 16 bits, to 8 bits (clever forced quantisation)
issues : classical music ( Orff - Carmina Burana : o Fortuna , Ravel's Boléro ) / subtleties in high contrasted pictures
Variation in Quantisation
Use the way human perception works : more receptive to high frequencies
E.g. : we see better objects that move, we notice the edge of things more
than the centers
Frame difference Compression
In video, sending information only about what did change from a key frame
(head talking in front of a fixed background)
Slide 25 : 25/28 : File Format (fileFormat.en.html)
Here is some music -- or is it an image ? or some text ? or some video ?
Only Matrix surfers are able to recognise it, or the computer
A format and a header for each file
The suffix or the type mime express the type of content, the format, then
the file starts with a header that gives more information on the way to read the
If a file was a map, the format would be the legend and the header the scale
and the direction of North
HTML, Gif, JPEG, MOV etc are file formats (for data, encapsulation of data)
Slide 26 : 26/28 : When a file format become a standard (standards.en.html)
When a file format become a Standard
Anybody can define a new format -- but not all formats become Standards
A need for it
A description of the structure of the files
Some tools to produce it
Some code to read/use it (library, plugin etc)
Some users and developers to adopt it
Some standards organization(s) to recognise it
ISO (International Standards Organization), IEEE (Intistitute of Electrical
and Electronics Engineers), IETF (Interneet Engineering Task Force), W3C (Web
Consortium), ECMA (European Computer Manufacturers Association, now Ecma International.
Slide 27 : 27/28 : Open vs proprietary (OpenStandards.en.html)
Open vs proprietary
Real standards are open standards : documentation and basic code should be
Royalties-Free, with no patent in the way.
Proprietary format may be sometimes considered as "defacto standard",
but are not real standard
Proprietary format and patent are a threat to the free access to your own
Gif vs PNG
: patents (Unisys and IBM) cover the LZW compression algorithm which is used
in making GIF
The Unisys patent expired on 20 June 2003 in the USA, in Europe it expired
on 18 June 2004, in Japan patent expired on 20 June 2004 and in Canada until
7 July 2004. The U.S. IBM patent expires 11 August 2006,
GIF Image Format (Unisys), Hyperlinking/Hypertext (British Telecom), JPEG
(Forgent Networks), MPEG-4 (ISO/IEC JTC 1/SC 29/WG 11), W3C P3P (Intermind),
RDF (Unified Data Technologies, Ltd.), Rights Expression Language (ContentGuard's
XrML), Stylesheets: CSS, XSL (Microsoft), XPointer (Sun Microsystems)
A vicious way to enter open standards : RAND "reasonable and non-discriminatory"
- RAND will hurt open source developers
- RAND discriminates against the poor (developers or developing countries)
- The RAND / RF split will fragment the Web. Some examples were given where
encumbered technologies may be worked around: Token ring v. Ethernet, PNG
v. GIF, SVG v. Flash, MP3 v. Ogg Vorbis
- RAND is not in the best interest of the Web
- RAND will stifle innovation and reduce choice
- RAND harms end users, not just developers
- Patents and standards are incompatible
- Universal access and RAND are incompatible
This week reading :
Slide 28 : 28/28 : Open vs proprietary (chosingFormat.en.html)
Choosing a format : parameters to take into account
Don't use a new media because it is new : chose it reluctantly because you
really need it, and make sure that your web site is still usable without an
access to that latest trendy next-internet-revolution...
Did I already told you that all the computer are different on the net ?
Display Size : From 640x480 to 1600x1200 pixels
Processor speed (important to decompress video or java animation)
Video System (integration of multiple video, 3D)
Browser type and settings
Network Configurations and Firewalls