SlideShare a Scribd company logo
1 of 5
Download to read offline
First play

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

$utf8_sentence = 'That will be £500 please';

//gives [That will be £500 please] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives [That will be £500 please] as no mismatch
//between actual character set of string
//and browser
echo $iso_sentence . '<br>';

//YOU TRY IT! When viewing this in your browser,
//set the page's encoding to UTF-8 and you will
//see the mojibake reverse!

?>

Within reason

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//gives [Notice: iconv(): Detected an illegal character in input
string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives an empty string
var_dump($iso_sentence);

?>
First transliteration

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 뒷 이야기] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//approximate characters that aren't in target character set
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT',
$utf8_sentence);

//gives [??? ? ???]
echo $iso_sentence . '<br>';

?>

More realistic transliteration (extended)

<?php

//note that this script file is UTF-8

//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some German
$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und
Götz';

//fine as UTF-8 is being displayed as UTF-8
echo $utf8_sentence . '<br>';

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]
//which is not quite what we expected (only 'ß' has been flattened)
echo $trans_sentence . '<br>';

//BUT iconv interacts with system locale setting so let's have a
play:

$current_locale = setlocale(LC_ALL, '0');
//gives, for me, "C" which is a kind of nondescript default
echo $current_locale . '<br>';
//we set the locale of the *target* character set
setlocale(LC_ALL, 'en_GB');

//try again...
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]
//which is our original string flattened into 7-bit ASCII!
echo $trans_sentence . '<br>';

//out of curiosity...
setlocale(LC_ALL, 'de_DE');

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
//which is exactly how a German would transliterate those
//umlauted characters if forced to use 7-bit ASCII!
//(because really ä = ae, ö = oe and ü = ue)
echo $trans_sentence . '<br>';

?>

Ignore example

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 뒷 이야기] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//discard characters that aren't in target character set
//STILL gives [Notice: iconv(): Detected an illegal character in
input string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence);

//gives " " (two space characters)
var_dump($iso_sentence);

?>
ob_iconv_handler

<?php

//note that this script file is UTF-8

//character set of PHP scripts etc
iconv_set_encoding('internal_encoding', 'UTF-8');

//character set of browser output
//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-
1;")
iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT');

ob_start('ob_iconv_handler');   //start output buffering

//Unicode string
$utf8_sentence = 'The Japanese title is "指輪物語"';

//when buffer is flushed, outputs [The Japanese title is "????"]
echo $utf8_sentence;

?>

iconv_strlen()

<?php

//note that this script file is UTF-8

//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some Russian (13 characters)
$utf8_sentence = 'Правительство';

//gives 13 which is correct
echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>';

//let's try core PHP
//gives 26 (the *byte* count). Oops!
echo strlen($utf8_sentence) . '<br>';

?>
Inter-Japanese conversion (not on presentation)

<?php

//note that this script file is UTF-8

//set browser to EUC-JP (a Japanese character set)
header("Content-Type: text/html; charset=EUC-JP;");

//some Japanese
$utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。';

//gives mojibake as UTF-8 is being displayed as EUC-JP
echo $utf8_sentence . '<br>';

$euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence);

//gives intact Japanese string
echo $euc_sentence . '<br>';

?>

More Related Content

What's hot

What's hot (18)

PHP and Databases
PHP and DatabasesPHP and Databases
PHP and Databases
 
An Introduction to PHP... and Why It's Yucky!
An Introduction to PHP... and Why It's Yucky!An Introduction to PHP... and Why It's Yucky!
An Introduction to PHP... and Why It's Yucky!
 
Php talk
Php talkPhp talk
Php talk
 
WordPress: From Antispambot to Zeroize
WordPress: From Antispambot to ZeroizeWordPress: From Antispambot to Zeroize
WordPress: From Antispambot to Zeroize
 
Class 6 - PHP Web Programming
Class 6 - PHP Web ProgrammingClass 6 - PHP Web Programming
Class 6 - PHP Web Programming
 
Add loop shortcode
Add loop shortcodeAdd loop shortcode
Add loop shortcode
 
Intro to php
Intro to phpIntro to php
Intro to php
 
Generating Power with Yield
Generating Power with YieldGenerating Power with Yield
Generating Power with Yield
 
PHP POWERPOINT SLIDES
PHP POWERPOINT SLIDESPHP POWERPOINT SLIDES
PHP POWERPOINT SLIDES
 
basic concept of php(Gunikhan sonowal)
basic concept of php(Gunikhan sonowal)basic concept of php(Gunikhan sonowal)
basic concept of php(Gunikhan sonowal)
 
Intro to PHP
Intro to PHPIntro to PHP
Intro to PHP
 
Smarty Template Engine
Smarty Template EngineSmarty Template Engine
Smarty Template Engine
 
PHP Variables and scopes
PHP Variables and scopesPHP Variables and scopes
PHP Variables and scopes
 
Sa
SaSa
Sa
 
Introduction to php web programming - get and post
Introduction to php  web programming - get and postIntroduction to php  web programming - get and post
Introduction to php web programming - get and post
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
bash
bashbash
bash
 
Quick tour of PHP from inside
Quick tour of PHP from insideQuick tour of PHP from inside
Quick tour of PHP from inside
 

Similar to "Character sets and iconv" PHP source code

Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsRay Paseur
 
Get into the FLOW with Extbase
Get into the FLOW with ExtbaseGet into the FLOW with Extbase
Get into the FLOW with ExtbaseJochen Rau
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash CourseWill Iverson
 
2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - phpHung-yu Lin
 
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016Codemotion
 
Ch1(introduction to php)
Ch1(introduction to php)Ch1(introduction to php)
Ch1(introduction to php)Chhom Karath
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Francois Cardinaux
 

Similar to "Character sets and iconv" PHP source code (20)

Unicode 101
Unicode 101Unicode 101
Unicode 101
 
Php mysql ppt
Php mysql pptPhp mysql ppt
Php mysql ppt
 
Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set Collisions
 
Get into the FLOW with Extbase
Get into the FLOW with ExtbaseGet into the FLOW with Extbase
Get into the FLOW with Extbase
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
slidesharenew1
slidesharenew1slidesharenew1
slidesharenew1
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash Course
 
PHP for Grown-ups
PHP for Grown-upsPHP for Grown-ups
PHP for Grown-ups
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - php
 
Php Lecture Notes
Php Lecture NotesPhp Lecture Notes
Php Lecture Notes
 
Introduction in php
Introduction in phpIntroduction in php
Introduction in php
 
Js mod1
Js mod1Js mod1
Js mod1
 
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
 
The new features of PHP 7
The new features of PHP 7The new features of PHP 7
The new features of PHP 7
 
Blog Hacks 2011
Blog Hacks 2011Blog Hacks 2011
Blog Hacks 2011
 
Ch1(introduction to php)
Ch1(introduction to php)Ch1(introduction to php)
Ch1(introduction to php)
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
 

More from Daniel_Rhodes

PhoneGap by Dissection
PhoneGap by DissectionPhoneGap by Dissection
PhoneGap by DissectionDaniel_Rhodes
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Daniel_Rhodes
 
PHP floating point precision
PHP floating point precisionPHP floating point precision
PHP floating point precisionDaniel_Rhodes
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment cultureDaniel_Rhodes
 
"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source codeDaniel_Rhodes
 
Internationalisation with PHP and Intl
Internationalisation with PHP and IntlInternationalisation with PHP and Intl
Internationalisation with PHP and IntlDaniel_Rhodes
 
Character sets and iconv
Character sets and iconvCharacter sets and iconv
Character sets and iconvDaniel_Rhodes
 
Handling multibyte CSV files in PHP
Handling multibyte CSV files in PHPHandling multibyte CSV files in PHP
Handling multibyte CSV files in PHPDaniel_Rhodes
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHPDaniel_Rhodes
 

More from Daniel_Rhodes (9)

PhoneGap by Dissection
PhoneGap by DissectionPhoneGap by Dissection
PhoneGap by Dissection
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"
 
PHP floating point precision
PHP floating point precisionPHP floating point precision
PHP floating point precision
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment culture
 
"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code
 
Internationalisation with PHP and Intl
Internationalisation with PHP and IntlInternationalisation with PHP and Intl
Internationalisation with PHP and Intl
 
Character sets and iconv
Character sets and iconvCharacter sets and iconv
Character sets and iconv
 
Handling multibyte CSV files in PHP
Handling multibyte CSV files in PHPHandling multibyte CSV files in PHP
Handling multibyte CSV files in PHP
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHP
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

"Character sets and iconv" PHP source code

  • 1. First play <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); $utf8_sentence = 'That will be £500 please'; //gives [That will be £500 please] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; $iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence); //gives [That will be £500 please] as no mismatch //between actual character set of string //and browser echo $iso_sentence . '<br>'; //YOU TRY IT! When viewing this in your browser, //set the page's encoding to UTF-8 and you will //see the mojibake reverse! ?> Within reason <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //gives [Notice: iconv(): Detected an illegal character in input string] $iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence); //gives an empty string var_dump($iso_sentence); ?>
  • 2. First transliteration <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· 이야기] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //approximate characters that aren't in target character set $iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $utf8_sentence); //gives [??? ? ???] echo $iso_sentence . '<br>'; ?> More realistic transliteration (extended) <?php //note that this script file is UTF-8 //set browser to UTF-8 header("Content-Type: text/html; charset=UTF-8;"); //some German $utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz'; //fine as UTF-8 is being displayed as UTF-8 echo $utf8_sentence . '<br>'; $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz] //which is not quite what we expected (only 'ß' has been flattened) echo $trans_sentence . '<br>'; //BUT iconv interacts with system locale setting so let's have a play: $current_locale = setlocale(LC_ALL, '0'); //gives, for me, "C" which is a kind of nondescript default echo $current_locale . '<br>';
  • 3. //we set the locale of the *target* character set setlocale(LC_ALL, 'en_GB'); //try again... $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz] //which is our original string flattened into 7-bit ASCII! echo $trans_sentence . '<br>'; //out of curiosity... setlocale(LC_ALL, 'de_DE'); $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz] //which is exactly how a German would transliterate those //umlauted characters if forced to use 7-bit ASCII! //(because really ä = ae, ö = oe and ü = ue) echo $trans_sentence . '<br>'; ?> Ignore example <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· 이야기] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //discard characters that aren't in target character set //STILL gives [Notice: iconv(): Detected an illegal character in input string] $iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence); //gives " " (two space characters) var_dump($iso_sentence); ?>
  • 4. ob_iconv_handler <?php //note that this script file is UTF-8 //character set of PHP scripts etc iconv_set_encoding('internal_encoding', 'UTF-8'); //character set of browser output //(sends HTTP header of "Content-Type: text/html; charset=ISO-8859- 1;") iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT'); ob_start('ob_iconv_handler'); //start output buffering //Unicode string $utf8_sentence = 'The Japanese title is "指輪物語"'; //when buffer is flushed, outputs [The Japanese title is "????"] echo $utf8_sentence; ?> iconv_strlen() <?php //note that this script file is UTF-8 //set browser to UTF-8 header("Content-Type: text/html; charset=UTF-8;"); //some Russian (13 characters) $utf8_sentence = 'Правительство'; //gives 13 which is correct echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>'; //let's try core PHP //gives 26 (the *byte* count). Oops! echo strlen($utf8_sentence) . '<br>'; ?>
  • 5. Inter-Japanese conversion (not on presentation) <?php //note that this script file is UTF-8 //set browser to EUC-JP (a Japanese character set) header("Content-Type: text/html; charset=EUC-JP;"); //some Japanese $utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。'; //gives mojibake as UTF-8 is being displayed as EUC-JP echo $utf8_sentence . '<br>'; $euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence); //gives intact Japanese string echo $euc_sentence . '<br>'; ?>