XUtils

Portable ASCII

A library to convert strings to ASCII.


Description

It is written in PHP (PHP 7+) and can work without “mbstring”, “iconv” or any other extra encoding php-extension on your server.

The benefit of Portable ASCII is that it is easy to use, easy to bundle.

The project based on …

Alternative

If you like a more Object Oriented Way to edit strings, then you can take a look at voku/Stringy, it’s a fork of “danielstjules/Stringy” but it used the “Portable ASCII”-Class and some extra methods.

// Portable ASCII
use voku\helper\ASCII;
ASCII::to_transliterate('déjà σσς iıii'); // 'deja sss iiii'

// voku/Stringy
use Stringy\Stringy as S;
$stringy = S::create('déjà σσς iıii');
$stringy->toTransliterate();              // 'deja sss iiii'

Install “Portable ASCII” via “composer require”

composer require voku/portable-ascii

Why Portable ASCII?

I need ASCII char handling in different classes and before I added this functions into “Portable UTF-8”, but this repo is more modular and portable, because it has no dependencies.

Portable ASCII | API

The API from the “ASCII”-Class is written as small static methods.

charsArray(bool $replace_extra_symbols): array

Returns an replacement array for ASCII methods.

EXAMPLE: \(array = ASCII::charsArray(); var_dump(\)array[‘ru’][‘б’]); // ‘b’

Parameters:

  • bool $replace_extra_symbols [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>

Return:

  • array

charsArrayWithMultiLanguageValues(bool $replace_extra_symbols): array

Returns an replacement array for ASCII methods with a mix of multiple languages.

EXAMPLE: \(array = ASCII::charsArrayWithMultiLanguageValues(); var_dump(\)array[‘b’]); // [‘β’, ‘б’, ‘ဗ’, ‘ბ’, ‘ب’]

Parameters:

  • bool $replace_extra_symbols [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>

Return:

  • array <p>An array of replacements.</p>

charsArrayWithOneLanguage(string \(language, bool \)replace_extra_symbols, bool $asOrigReplaceArray): array

Returns an replacement array for ASCII methods with one language.

For example, German will map ‘ä’ to ‘ae’, while other languages will simply return e.g. ‘a’.

EXAMPLE: \(array = ASCII::charsArrayWithOneLanguage('ru'); \)tmpKey = \array_search(‘yo’, \(array['replace']); echo \)array[‘orig’][$tmpKey]; // ‘ё’

Parameters:

  • ASCII::* $language [optional] <p>Language of the source string e.g.: en, de_at, or de-ch. (default is 'en') | ASCII::*_LANGUAGE_CODE</p>
  • bool $replace_extra_symbols [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>
  • bool $asOrigReplaceArray [optional] <p>TRUE === return {orig: string[], replace: string[]} array</p>

Return:

  • array <p>An array of replacements.</p>

charsArrayWithSingleLanguageValues(bool \(replace_extra_symbols, bool \)asOrigReplaceArray): array

Returns an replacement array for ASCII methods with multiple languages.

EXAMPLE: \(array = ASCII::charsArrayWithSingleLanguageValues(); \)tmpKey = \array_search(‘hnaik’, \(array['replace']); echo \)array[‘orig’][$tmpKey]; // ‘၌’

Parameters:

  • bool $replace_extra_symbols [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>
  • bool $asOrigReplaceArray [optional] <p>TRUE === return {orig: string[], replace: string[]} array</p>

Return:

  • array <p>An array of replacements.</p>

clean(string \(str, bool \)normalize_whitespace, bool \(keep_non_breaking_space, bool \)normalize_msword, bool $remove_invisible_characters): string

Accepts a string and removes all non-UTF-8 characters from it + extras if needed.

Parameters:

  • string $str <p>The string to be sanitized.</p>
  • bool $normalize_whitespace [optional] <p>Set to true, if you need to normalize the whitespace.</p>
  • bool $keep_non_breaking_space [optional] <p>Set to true, to keep non-breaking-spaces, in combination with $normalize_whitespace</p>
  • bool $normalize_msword [optional] <p>Set to true, if you need to normalize MS Word chars e.g.: "…" => "..."</p>
  • bool $remove_invisible_characters [optional] <p>Set to false, if you not want to remove invisible characters e.g.: "\0"</p>

Return:

  • string <p>A clean UTF-8 string.</p>

getAllLanguages(): string[]

Get all languages from the constants “ASCII::.*LANGUAGE_CODE”.

Parameters: nothing

Return:

  • string[]

is_ascii(string $str): bool

Checks if a string is 7 bit ASCII.

EXAMPLE: ASCII::is_ascii(‘白’); // false

Parameters:

  • string $str <p>The string to check.</p>

Return:

  • bool <p> <strong>true</strong> if it is ASCII<br> <strong>false</strong> otherwise </p>

normalize_msword(string $str): string

Returns a string with smart quotes, ellipsis characters, and dashes from Windows-1252 (commonly used in Word documents) replaced by their ASCII equivalents.

EXAMPLE: ASCII::normalize_msword(‘„Abcdef…”’); // ‘“Abcdef…”’

Parameters:

  • string $str <p>The string to be normalized.</p>

Return:

  • string <p>A string with normalized characters for commonly used chars in Word documents.</p>

normalize_whitespace(string \(str, bool \)keepNonBreakingSpace, bool \(keepBidiUnicodeControls, bool \)normalize_control_characters): string

Normalize the whitespace.

EXAMPLE: ASCII::normalize_whitespace(“abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC”, true); // “abc-\xc2\xa0-öäü- -”

Parameters:

  • string $str <p>The string to be normalized.</p>
  • bool $keepNonBreakingSpace [optional] <p>Set to true, to keep non-breaking-spaces.</p>
  • bool $keepBidiUnicodeControls [optional] <p>Set to true, to keep non-printable (for the web) bidirectional text chars.</p>
  • bool $normalize_control_characters [optional] <p>Set to true, to convert e.g. LINE-, PARAGRAPH-SEPARATOR with "\n" and LINE TABULATION with "\t".</p>

Return:

  • string <p>A string with normalized whitespace.</p>

remove_invisible_characters(string \(str, bool \)url_encoded, string \(replacement, bool \)keep_basic_control_characters): string

Remove invisible characters from a string.

e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.

copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php

Parameters:

  • string $str
  • bool $url_encoded
  • string $replacement
  • bool $keep_basic_control_characters

Return:

  • string

to_ascii_remap(string \(str1, string \)str2): string[]

WARNING: This method will return broken characters and is only for special cases.

Convert two UTF-8 encoded string to a single-byte strings suitable for functions that need the same string length after the conversion.

The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.

Parameters:

  • string $str1
  • string $str2

Return:

  • string[]

to_filename(string \(str, bool \)use_transliterate, string $fallback_char): string

Convert given string to safe filename (and keep string case).

EXAMPLE: ASCII::to_filename(‘שדגשדג.png’, true)); // ‘shdgshdg.png’

Parameters:

  • string $str
  • bool $use_transliterate <p>ASCII::to_transliterate() is used by default - unsafe characters are simply replaced with hyphen otherwise.</p>
  • string $fallback_char

Return:

  • string <p>A string that contains only safe characters for a filename.</p>

to_slugify(string \(str, string \)separator, string \(language, string[] \)replacements, bool \(replace_extra_symbols, bool \)use_str_to_lower, bool $use_transliterate): string

Converts the string into an URL slug. This includes replacing non-ASCII characters with their closest ASCII equivalents, removing remaining non-ASCII and non-alphanumeric characters, and replacing whitespace with $separator. The separator defaults to a single dash, and the string is also converted to lowercase. The language of the source string can also be supplied for language-specific transliteration.

Parameters:

  • string $str
  • string $separator [optional] <p>The string used to replace whitespace.</p>
  • ASCII::* $language [optional] <p>Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE</p>
  • array<string, string> $replacements [optional] <p>A map of replaceable strings.</p>
  • bool $replace_extra_symbols [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>
  • bool $use_str_to_lower [optional] <p>Use "string to lower" for the input.</p>
  • bool $use_transliterate [optional] <p>Use ASCII::to_transliterate() for unknown chars.</p>

Return:

  • string <p>A string that has been converted to an URL slug.</p>

to_transliterate(string \(str, string|null \)unknown, bool $strict): string

Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed unless instructed otherwise.

EXAMPLE: ASCII::to_transliterate(‘déjà σσς iıii’); // ‘deja sss iiii’

Parameters:

  • string $str <p>The input string.</p>
  • string|null $unknown [optional] <p>Character use if character unknown. (default is '?') But you can also use NULL to keep the unknown chars.</p>
  • bool $strict [optional] <p>Use "transliterator_transliterate()" from PHP-Intl

Return:

  • string <p>A String that contains only ASCII characters.</p>

Unit Test

  1. Composer is a prerequisite for running the tests.
composer install
  1. The tests can be executed by running this command from the root directory:
./vendor/bin/phpunit

Articles

  • coming soon...