Hole in notepad

Fatal1ty · August 12, 2006

Over at WinCustomize, someone thought they'd found an Easter Egg in the Windows Notepad application. If you:

1. Open Notepad

2. Type the text "this app can break" (without quotes)

3. Save the file

4. Re-open the file in Notepad

Notepad displays seemingly-random Chinese characters, or boxes if your default Notepad font doesn't support those characters.

It's not an Easter egg (even though it seems like a funny one), and as it turns out, Notepad writes the file correctly. It's only when Notepad reads the file back in that it seems to lose its mind.

But we can't even blame Notepad: it's a limitation of Windows itself, specifically the Windows function that Notepad uses to figure out if a text file is Unicode or not.

You see, text files containing Unicode (more correctly, UTF-16-encoded Unicode) are supposed to start with a "Byte-Order Mark" (BOM), which is a two-byte flag that tells a reader how the following UTF-16 data is encoded. Given that these two bytes are exceedingly unlikely to occur at the beginning of an ASCII text file, it's commonly used to tell whether a text file is encoded in UTF-16.

But plenty of applications don't bother writing this marker at the beginning of a UTF-16-encoded file. So what's an app like Notepad to do?

Windows helpfully provides a function called IsTextUnicode()--you pass it some data, and it tells you whether it's UTF-16-encoded or not.

Sorta.

It actually runs a couple of heuristics over the first 256 bytes of the data and provides its best guess. As it turns out, these tests aren't terribly reliable for very short ASCII strings that contain an even number of lower-case letters, like "this app can break", or more appropriately, "this api can break".

The documentation for IsTextUnicode says:

These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through. For example, if lpBuffer points to the ASCII string 0x41, 0x0A, 0x0D, 0x1D (Anr^Z), the string passes the IS_TEXT_UNICODE_STATISTICS test, though failure would be preferable.

Sad_Dreamer · August 12, 2006

nu am stat sa citesc tot...asta a zis si Shocker pe forum daca scrii in notepad 4biti apoi 3biti apoi 3biti apoi 5biti

exemplu:"ascd asd dsa sdsfg" fara ghilimele si la fel apare

eddie47 · August 12, 2006

Cum e si faza cu "bush hid the facts"... Nu e mare branza da' e destul de interesant

PS: Dupa cateva incercari nu mai tine faza

azsx · August 12, 2006

hheheh..

zexe · August 13, 2006

interesant ... am mai citit undeva cum ca : Nu poti denumi un folder CON

... etc

Sad_Dreamer · August 13, 2006

da asha este acum am incercat :)) )

zexe · August 13, 2006

pai stiu ca asa este altfel nu spuneam

BSA · August 13, 2006

Da felicitari.Dar te-ai intrebat vreodata de ce nu merge sa redenumesti?Sau ti s-a parut prea greu.Na de aici:

http://msdn.microsoft.com/library/default....ming_a_file.asp

Si cand mai gasiti faze dastea incercati sa le intelgeti nu fiti roboti.

nicu1991 · January 4, 2008

cool stuff

uite cateva texte populare tot pentru notepad

Bill fed the goats

bush hid the facts

John has the parts

bill can not dance

this app can break

feel the new power

Sign In

Hole in notepad

Recommended Posts

Fatal1ty

Sad_Dreamer

eddie47

azsx

zexe

Sad_Dreamer

zexe

BSA

nicu1991

Join the conversation

Browse

Activity

Pages