Cara menggunakan php preg_replace unicode

So i'm not still really familiar with regex and how does it work, but i have issue with my code. I have ¤ in my text and it gets replace with unicode ? block. https://en.wikipedia.org/wiki/Specials_(Unicode_block) for me and I don't get why.

Here is the code I'm using.




 charset="utf-8">



php
/* Test preg_replace function */
$preg_pattern = "/[^A-Za-z0-9!\"#%£&()=@\s]/";

if (isset($_GET['preg'])) {
echo "
"; echo "Original string: " . $_GET['preg']; echo "
"
; echo "Preg pattern: " . $preg_pattern; echo "
"
; echo "Result: " . preg_replace($preg_pattern,"",$_GET['preg']); echo "
"
; } ?> method="get" action="preg_test.php"> php if (isset($_GET['preg'])) { echo " value='" . $_GET['preg'] . "'"; } ?>> type="submit">

And here is the string I'm trying to use.

Test 0-9, Specialcharacters: !"#¤%&/()=?

Result image below

Any guides / help appreciated.

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Ingolme

Posted August 17, 2019

Ingolme

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Moderator
    • 14.8k
  • Interests:Software development, videogames
  • Languages:C++, Java, PHP, SQL, Javascript, CSS, HTML

    • Share

Posted August 17, 2019

Your code editor most likely has not encoded the file as UTF-8. Depending on which code editor you're using the way to do this is different. On Windows Notepad, there's an "encoding" dropdown in the same dialog which you should set to "UTF-8". Other text editors have encoding menus or an encoding option in the document properties.

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Mudsaf

Posted August 18, 2019

Mudsaf

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 462
  • Location:Finland
  • Interests:Coffee & Gaming
  • Languages:php,css,javascript,jquery,mysql,html

  • Author

    • Share

Posted August 18, 2019 (edited)

Changed to UTF-8 (not sure if it were already), issue still persists somehow. The odd part is, this works when the £ is not added in the preg pattern.

 

Edited August 18, 2019 by Mudsaf
had wrong image

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

dsonesuk

Posted August 18, 2019

dsonesuk

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 11.2k

    • Share

Posted August 18, 2019

Yes but is url encoded from $_GET querystring might be %C2%A4 same as space would be %20

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Mudsaf

Posted August 18, 2019

Mudsaf

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 462
  • Location:Finland
  • Interests:Coffee & Gaming
  • Languages:php,css,javascript,jquery,mysql,html

  • Author

    • Share

Posted August 18, 2019 (edited)

Apparently my browser forces ¤ to be at url, even though i replace it with %C2%A4 (if this was what you meant). Rest of the stuff is encoded properly (chrome). But on edge it is encoded.

(Image from Edge browser)

Also  tried urldecode the GET parameter.

Edited August 18, 2019 by Mudsaf

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

dsonesuk

Posted August 18, 2019

dsonesuk

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 11.2k

    • Share

Posted August 18, 2019

What about forcing it encoded so there all read the same with php rawurlencode().

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Mudsaf

Posted August 18, 2019

Mudsaf

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 462
  • Location:Finland
  • Interests:Coffee & Gaming
  • Languages:php,css,javascript,jquery,mysql,html

  • Author

    • Share

Posted August 18, 2019

How would that work with preg_replace()?

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

dsonesuk

Posted August 18, 2019

dsonesuk

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 11.2k

    • Share

Posted August 18, 2019

Probably not, but since

"URLs can only be sent over the Internet using the ASCII character-set.

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format."

You ideally should be filtering a url that gives you the same result, not sometimes one or the other.

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

justsomeguy

Posted August 19, 2019

justsomeguy

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Moderator
    • 31.6k
  • Location:Phoenix
  • Languages:Focusing on PHP and JavaScript

    • Share

Posted August 19, 2019

I would use ord or mb_ord to loop through the characters in the string and print the value of each byte to see what's actually there.

https://www.php.net/manual/en/function.ord.php

https://www.php.net/manual/en/function.mb-ord.php

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Mudsaf

Posted August 20, 2019

Mudsaf

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 462
  • Location:Finland
  • Interests:Coffee & Gaming
  • Languages:php,css,javascript,jquery,mysql,html

  • Author

    • Share

Posted August 20, 2019 (edited)

On 8/19/2019 at 8:57 PM, justsomeguy said:

I would use ord or mb_ord to loop through the characters in the string and print the value of each byte to see what's actually there.

https://www.php.net/manual/en/function.ord.php

https://www.php.net/manual/en/function.mb-ord.php

The unicode block symbol returned value of 194 via ord() function, got any idea what might be the cause to create that unicode block?

Source code of tester below, string too.

 




 charset="utf-8">



php
/* Test preg_replace function */

// #%&()=@£\$€\[\]_\-,.:?
$basic = "A-Za-Z0-9";
$preg_pattern = "/[^A-Za-z0-9!\"#%£&()=@\s]/";
$func_preg_replace = preg_replace($preg_pattern,"",$_GET['preg']);
if (isset($_GET['preg'])) {
echo "
"; echo "Original string: " . $_GET['preg']; echo "
"
; echo "Preg pattern: " . $preg_pattern; echo "
"
; echo "Result: " . $func_preg_replace; echo "
"
; echo "Rawurl: " . rawurlencode($_GET['preg']); echo "
"
; echo "Ord: "; for ($i=0;$i<strlen($func_preg_replace);$i++) { echo $func_preg_replace[$i] . "(" . ord($func_preg_replace[$i]) . ") "; } echo "
"
; } ?> class="mid"> method="get" action="preg_test.php"> php if (isset($_GET['preg'])) { echo " value='" . $_GET['preg'] . "'"; } ?>> type="submit">

And the string without

What is love 0-9, Specialcharacters: !"#¤%&/()=?

Does it reproduce for you guys?

----

Also tried to UTF-8 encode string via php, the preg_replace string, 2 new unicode blocks appeared with �(195) �(130).

$func_preg_replace = utf8_encode ($func_preg_replace);

 

Edited August 20, 2019 by Mudsaf

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

justsomeguy

Posted August 21, 2019

justsomeguy

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Moderator
    • 31.6k
  • Location:Phoenix
  • Languages:Focusing on PHP and JavaScript

    • Share

Posted August 21, 2019

What about the original string?  Is the 194 part of a code point that only gets partially replaced?

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Mudsaf

Posted August 21, 2019

Mudsaf

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 462
  • Location:Finland
  • Interests:Coffee & Gaming
  • Languages:php,css,javascript,jquery,mysql,html

  • Author

    • Share

Posted August 21, 2019

Original string returns 2 unicode blocks (straight from $_GET): �(194) �(164). Where ¤ is.

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

justsomeguy

Posted August 21, 2019

justsomeguy

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Moderator
    • 31.6k
  • Location:Phoenix
  • Languages:Focusing on PHP and JavaScript

    • Share

Posted August 21, 2019

Add the u modifier to your pattern:

/[^A-Za-z0-9!\"#%£&()=@\s]/u

  • Cara menggunakan php preg_replace unicode
    1

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

Mudsaf

Posted August 21, 2019

Mudsaf

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Members
    • 462
  • Location:Finland
  • Interests:Coffee & Gaming
  • Languages:php,css,javascript,jquery,mysql,html

  • Author

    • Share

Posted August 21, 2019

7 minutes ago, justsomeguy said:

Add the u modifier to your pattern:

 

/[^A-Za-z0-9!\"#%£&()=@\s]/u

 

Not sure what kind of sorcery is this, but it works now. Thank you!

Link to comment
Share on other sites

More sharing options...

Cara menggunakan php preg_replace unicode
Cara menggunakan php preg_replace unicode

justsomeguy

Posted August 21, 2019

justsomeguy

  • Cara menggunakan php preg_replace unicode
    Cara menggunakan php preg_replace unicode

  • Moderator
    • 31.6k
  • Location:Phoenix
  • Languages:Focusing on PHP and JavaScript

    • Share

Posted August 21, 2019

From the manual:

u (PCRE_UTF8)This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.