is_string[] of an integer or float returns false, so it might be useful to include an is_numeric[] when checking if a value is stringy:
function is_stringy[$val] {
return [is_string[$val] || is_numeric[$val]
|| [is_object[$val] && method_exists[$val, '__toString']]];
}
?>
Test code [which should print "vector N OK" for each test vector]:
foreach [[[NULL, false], [false, false], [true, false],
[0, true], [[], false], [0.1, true], ["x", true],
["", true], [new Exception["x"], true]] as $idx => $vector] {
list [$val, $expected] = $vector;
if [is_stringy[$val] != $expected] {
print ["mismatch at $idx\n"];
var_dump[$val];
} else {
print ["vector $idx OK\n"];
}
}
?>
The following is an example of how to test if a variable is set, whether or not it is NULL. It makes use of the fact that an unset variable will throw an E_NOTICE error, but one initialized as NULL will not.
function var_exists[$var]{
if [empty[$GLOBALS['var_exists_err']]] {
return true;
} else {
unset[$GLOBALS['var_exists_err']];
return false;
}
}
function var_existsHandler[$errno, $errstr, $errfile, $errline] {
$GLOBALS['var_exists_err'] = true;
}
$l = NULL;
set_error_handler["var_existsHandler", E_NOTICE];
echo [var_exists[$l]] ? "True " : "False ";
echo [var_exists[$k]] ? "True " : "False ";
restore_error_handler[];
?>
Outputs:
True False
The problem is, the set_error_handler and restore_error_handler calls can not be inside the function, which means you need 2 extra lines of code every time you are testing. And if you have any E_NOTICE errors caused by other code between the set_error_handler and restore_error_handler they will not be dealt with properly. One solution:
function var_exists[$var]{
if [empty[$GLOBALS['var_exists_err']]] {
return true;
} else {
unset[$GLOBALS['var_exists_err']];
return false;
}
}
function var_existsHandler[$errno, $errstr, $errfile, $errline] {
$filearr = file[$errfile];
if [strpos[$filearr[$errline-1], 'var_exists'] !== false] {
$GLOBALS['var_exists_err'] = true;
return true;
} else {
return false;
}
}
$l = NULL;
set_error_handler["var_existsHandler", E_NOTICE];
echo [var_exists[$l]] ? "True " : "False ";
echo [var_exists[$k]] ? "True " : "False ";
is_null[$j];
restore_error_handler[];
?>
function var_exists[$var]{
1
if [empty[$GLOBALS['var_exists_err']]] {
return true;
} else {
unset[$GLOBALS['var_exists_err']];
return false;
}
}
function var_exists[$var]{
2
if [empty[$GLOBALS['var_exists_err']]] {
return true;
} else {
unset[$GLOBALS['var_exists_err']];
return false;
}
}
In summary, strcmp[] does not necessarily use the ASCII code order of each character like in the 'C' locale, but instead parse each string to match language-specific character entities [such as 'ch' in Spanish, or 'dz' in Czech], whose collation order is then compared. When both character entities have the same collation order [such as 'ss' and '?' in German], they are compared relative to their code by strcmp[], or considered equal by strcasecmp[].
The LC_COLLATE locale setting is then considered: only if LC_COLLATE=C or LC_ALL=C does strcmp[] compare strings by character code.
Generally, most locales define the following order:
control, space, punctuation and underscore, digit, alpha [lower then upper with Latin scripts; or final, middle, then isolated, initial with Arabic script], symbols, others...
With strcasecmp[], the alpha subclass is ignored and consider all forms of letters as equal.
Note also that some locales behave differently with accented characters: some consider they are the same letter as the unaccented letter [with a minor collation order, e.g. French, Italian, Spanish], some consider they are distinct letters with an independant collation order [e.g. in the C locale, or in Nordic languages].
Finally, the collation string is not considering individual characters but instead groups of characters that form a single letter:
- for example "ch" or "CH" in Spanish which is always after all other strings beginning with 'c' or 'C', including "cz", but before 'd' or 'D';
- 'ss' and '?' in German;
- 'dz', 'DZ' and 'Dz' in some Central European languages written with the Latin script...
- UTF-8, UTF-16 [Unicode], S-JIS, Big5, ISO2022 character encoding of a locale [the suffix in the locale name] first decode the characters into the UCS4/ISO10646 code position before applying the rules of the language indicated by the main locale...
So be extremely careful to what you consider a "character", as it may just mean a encoding byte with no significance in the string collation algorithm: the first character of the string "cholera" in Spanish is "ch", not "c" !