Masquerade of symbols: unicode-oriented security aspects. Instead of files, there are “hieroglyphs” (unintelligible symbols) on the Windows flash drive in various Windows applications

Krakozyabry - what is such an interesting word? This word is usually used by Russian users to describe the incorrect/incorrect display (encoding) of characters in programs or the Operating System itself.
Why does this happen? You won't find a definite answer. This may be due to the tricks of our “favorite” viruses, perhaps due to a malfunction of the Windows OS (for example, the electricity went out and the computer turned off), perhaps the program created a conflict with another OS and everything went haywire. In general, there can be many reasons, but the most interesting one is “It just broke down like that.”
Read the article and find out how to fix the problem with encoding in programs and Windows OS, once it has happened.

For those who still don’t understand what I mean, here are a few:

By the way, I also found myself in this situation once and I still have a file on my desktop that helped me cope with it. That's why I decided to write this article.

Several “things” are responsible for displaying the encoding (font) in Windows - the language, the registry, and the files of the OS itself. Now we will check them separately and point by point.

How to remove and correct krakozyabry instead of Russian (Russian letters) in a program or Windows.

1. Check installed language for programs that do not support Unicode. Maybe it's lost on you.

So, let's follow the path: Control Panel - Regional and Language Options - Advanced tab
There we make sure that the language is Russian.

In Windows XP, in addition to this, at the bottom there is a list of “Conversion table code pages” and in it there is a line with the number 20880. There needs to be a Russian there too

6. The last point in which I give you a file that helped me fix everything once and that’s why I left it as a keepsake. Here is the archive:

There are two files inside: krakozbroff.cmd and krakozbroff.reg

They have the same principle - correct hieroglyphs, squares, questions or exclamation marks in programs and Windows OS (in common parlance, krakozyabry). I used the first one and it helped me.

And finally, a couple of tips:
1) If you work with the registry, do not forget to make a backup ( backup copy) in case something goes wrong.
2) It is advisable to check the 1st point after each point.

That's all. Now you know how to fix/remove Crackers (squares, hieroglyphs, exclamation and question marks) in a program or Windows.

Attention!!! Get ready, this article will be long. You can get tired and fall asleep, so sit back, grab a cup of coffee and let's get started.

Learning Chinese characters is an important part of learning the language itself. There are many ways, means and ideas on how to study them. This article will talk about some of them. Different people, depending on their goals, teach them differently.
For example, someone just wants to know a certain number of characters. Others want to read text in hieroglyphs. Others want not only to read, but also to be able to write hieroglyphs. And then there are those who are going to make notes in Chinese or write texts. Again, write by hand because computer typing is much easier.
It is worth noting that modern busyness does not allow a person to fully immerse himself in the learning process without distractions. It is especially difficult for those who learn the language on their own and “whenever they can.” It is worth choosing a way to study hieroglyphs individually. Those who are limited in time probably want to find a convenient application for their device so that they can “gnaw on the granite of science” in their free minute. Well, for those who study a language as a specialty, everything should be suitable, but who doesn’t want to reduce the time it takes to acquire skills?
I will add that learning a hieroglyph can mean different things to different people. In the full sense, to learn a hieroglyph means to know its pronunciation, spelling and meaning. So, what are some ways to develop all these skills and master Chinese characters? Let's start with paper ones, then electronic ones.

1. Prescribing hieroglyphs. The traditional way of learning characters, tested by millions of Chinese. It must be remembered that they prescribe hieroglyphs throughout the school course. This is not a couple of years. So, the advantages of the method:
- visual and muscle memory is involved;
- writing skills and handwriting are developed;
- study of hieroglyphs in random order;
- the ability to return to what was written immediately;
- other.

The disadvantages include:
- paper and writing implements are required;
- it takes a lot of time to write one hieroglyph;
- you need to store a lot of paper;
- you need space and time for a quality approach to exercises.
- other.

You can write hieroglyphs in a regular notebook with a regular pen. Those who approach this more thoroughly write it down in special recipes. There was about how to write hieroglyphs, as well as examples of elementary copybooks. A more advanced prescribing method is prescribing using templates. They are also different.

1. Template. It may look different, but the essence is the same. Tracing paper is placed on top of the text, on which hieroglyphs are written. The problem is that in such a template there is no pronunciation of hieroglyphs, that is, only familiar recognition and calligraphy are trained.

2. Template. Prescription occurs according to the specified sequence of traits. The meaning of the hieroglyphs is also given. The pronunciation remains behind the scenes.

3. There are other recipes that would take a long time to describe. Here are the links you can download and print.

2. Associative method.

The essence of the method is simple. Come up with what the hieroglyph looks like and somehow connect this image with the meaning and pronunciation of the hieroglyph. Was . You can write down all the associations in a notebook and return to them to repeat.
This can also include mastering hieroglyphs using keys. The difference is that the associations will become concrete rather than abstract. But first you need to master the keys. I wrote about this in this article, and it was also in this article. You can combine associations and writing hieroglyphs. But it also takes a lot of time. But it is remembered for a long time.
More about this, on the other hand, was in this article.

3. Cards.

They're flash cards. The point is that hieroglyphs are written or simply printed on the cards. On back side their meaning, pronunciation, or both. It doesn’t help everyone, it takes up space, requires a lot of time for classification, and preferably a good visual memory. Here are some of my old collections:

By the way, it helps some people when they learn a program from textbooks that provide sequences for writing hieroglyphs. These could be textbooks by Zadoenko, Kondrashevsky, etc.

Maybe. An experienced student of the Chinese language will be able to give other “paper” ways of mastering and memorizing hieroglyphs. But I decided for now to dwell on what is stated above. Let's move on to electronic ones.

1. Flash cards.

People realized that several thousand hieroglyphs is a large volume of cards. A whole box! It is possible in in electronic format do them. We created all sorts of programs that different platforms ah reproduce these cards.

Anyone interested in this method should get acquainted with the program. Also not for everybody. Spaced repetition of pictures is also associated with visual memory. Not everyone is equally good at it. But not only Japanese and Chinese can be learned. In addition, the application is available for different platforms.

There are other applications of the same nature. For example, on the Magazeta website there was one such application: a link to the article.

2. Hieroglyph processors.

I once tried to get acquainted with hieroglyphs using the NJStar program. Didn't really help, but someone might find it for her useful application on your computer. Here . In this program you can enter hieroglyphs with the mouse.

3. Online translators.

Google Translator has a touchscreen input feature. There you can write hieroglyphs with your finger directly on your mobile device. Internet required. There is no clear memorization program, just the ability to write not on paper. The same applies to entering hieroglyphs with the mouse into online dictionaries, such as www.bkrs.info. Next to the search bar there is a manual input button, it is sometimes not visible due to the theme around the line, but it is definitely there on the right. You can enter a hieroglyph with the mouse and see its meaning, sometimes listen to the pronunciation. Eliminates the need to write on paper.

4. Other programs.

You can find other software on the Internet. I haven't tested everything, so I can't describe much. But I want to say a few words about the MAO system. I didn’t like the approach to memorizing hieroglyphs, but I still decided to present it in this article, since there is an “MAOcard” application. And someone may rate this system higher than me. Link...

Let's continue...

You can write anything else about this, but in order to save at least your time, I will provide a link to a page from the Magazine, where the author provides a whole bunch of useful software for different platforms. Among everything, there are applications for studying and repeating hieroglyphs. But I would still like to emphasize that it is one thing to repeat or remember hieroglyphs, and quite another thing to memorize them. This seems to make sense when you either know the words but can't read, or you learn both the words and the characters at once. Special applications are suitable for these purposes.

I would like to specifically mention the application for the Android platform "Chineseskill". It is developing and, in my opinion, combines many advantages. The study of hieroglyphs goes in parallel with the study of vocabulary and grammar. You will have to write and pronounce words. Sometimes manually, with your finger. Maybe this is what you need?..

Another app I recommend for students Chinese and, in Specifically, hieroglyphs, is the "Chinese Writer" application. I already did short description this application. But I will say that even with a few inconveniences, such as a creeping line at the bottom of the screen with information about the hieroglyph, the application is excellent. You can look at hieroglyphs, learn to write them, test yourself in the game, and more. In my opinion, you should have this on your device... There are paid and free versions.

Conclusion.

In conclusion, I will say that I could not list everything that was on my smartphone and tablet. I tried different programs, but alas, it’s not ideal. Or maybe I haven't found it yet. But what I listed above is worth a try. One way or another, all these are just means of introducing hieroglyphs into your memory. But how she will perceive them, whether she will want to give them back later, is another question. Therefore, in addition to studying the hieroglyphs themselves, I recommend getting a good night’s sleep and training your memory. Thank you for reading to the end, now your understanding of the issue is probably broader.

I think you’ve come across exploits that are classified as Unicode more than once, looked for the right encoding to display a page, and been happy with the next gimmicks here and there. You never know what else! If you want to find out who started this whole mess and is still cleaning it up to this day, fasten your seat belts and read on.

As they say, “initiative is punishable” and, as always, the Americans are to blame for everything.

And it was like this. At the dawn of the heyday of the computer industry and the spread of the Internet, a need arose for a universal system for representing symbols. And in the 60s of the last century, ASCII appeared - “American Standard Code for Information Interchange” (American Standard Code for Information Exchange), a familiar 7-bit character encoding. The last eighth unused bit was left as a control bit to customize the ASCII table to suit the needs of each computer customer in a particular region. This bit made it possible to expand the ASCII table to use its own characters for each language. Computers were supplied to many countries, where they already used their own modified table. But later this feature grew into a headache, since data exchange between computers became quite problematic. New 8-bit code pages were incompatible with each other - the same code could mean several different characters. To resolve this problem, ISO (“International Organization for Standardization”, International Organization for Standardization) proposed new table, namely “ISO 8859”.

This standard was later renamed UCS (“Universal Character Set”). However, by the time the UCS was first released, Unicode had appeared. But since the goals and objectives of both standards coincided, it was decided to join forces. Well, Unicode has taken on the difficult task of giving each character a unique designation. Currently the latest version of Unicode is 5.2.

I want to warn you - in fact, the story with encodings is very murky. Different sources provide different facts, so you shouldn’t focus on one thing, just be aware of how everything was formed and follow modern standards. We are, I hope, not historians.

Unicode crash course

Before delving into the topic, I would like to explain what Unicode is in technical terms. Goals this standard We already know, all that remains is to patch up the hardware.

So what is Unicode? Simply put, this is a way to represent any character as a specific code for all languages of the world. Latest version The standard contains about 1,100,000 codes, which occupy space from U+0000 to U+10FFFF. But be careful here! Unicode strictly defines what a code is for a character and how that code will be represented in memory. Character codes (for example, 0041 for the character “A”) do not have any meaning, but there is a logic for representing these codes in bytes; this is done by encodings. The Unicode Consortium offers the following types of encodings, called UTF (Unicode Transformation Formats). And here they are:

UTF-7: This encoding is not recommended for security and compatibility reasons. Described in RFC 2152. Not part of Unicode, but was introduced by this consortium.
UTF-8: The most common encoding in the web space. Is a variable, 1 to 4 bytes wide. Backwards compatible with protocols and programs that use ASCII. Occupies the range from U+0000 to U+007F.
UTF-16: Uses variable width from 2 to 4 bytes. The most common use is 2 bytes. UCS-2 is the same encoding, only with a fixed width of 2 bytes and limited to BMP limits.
UTF-32: uses a fixed width of 4 bytes, i.e. 32 bits. However, only 21 bits are used, the remaining 11 are filled with zeros. Although this encoding is cumbersome in terms of space, it is considered the most efficient in terms of performance due to 32-bit addressing in modern computers.

The closest analogue of UTF-32 is the UCS-4 encoding, but today it is used less frequently.

Despite the fact that UTF-8 and UTF-32 can represent a little more than two billion characters, it was decided to limit ourselves to a little over a million for the sake of compatibility with UTF-16. The entire code space is grouped into 17 planes, each with 65,536 symbols. The most frequently used symbols are located in the zero, base plane. Referred to as BMP - Basic MultiPlane.
A data stream in UTF-16 and UTF-32 encodings can be represented in two ways - little endian and big endian, called UTF-16LE/UTF-32LE, UTF16BE/UTF-32BE, respectively. As you might have guessed, LE is little-endian, and BE is big-endian. But we must somehow be able to distinguish between these orders. To do this, use the byte order mark U+FEFF, in the English version - BOM, “Byte Order Mask”. This BOM may also appear in UTF-8, but it means nothing there.

For the sake of backward compatibility, Unicode had to accommodate characters from existing encodings. But here another problem arises - there are many variants of identical characters that need to be processed somehow. Therefore, so-called “normalization” is needed, after which it is already possible to compare two strings. There are 4 forms of normalization:

Normalization Form D (NFD): canonical decomposition.
Normalization Form C (NFC): canonical decomposition + canonical composition.
Normalization Form KD (NFKD): compatible decomposition.
Normalization Form KC (NFKC): compatible decomposition + canonical composition.

Now let's talk more about these strange words.

Unicode defines two types of string equality - canonical and compatibility.

The first involves the decomposition of a complex symbol into several individual figures, which as a whole form the original symbol. The second equality finds the closest matching symbol. And composition is the combination of symbols from different parts, decomposition is the opposite action. In general, look at the drawing, everything will fall into place.

For security reasons, normalization should be done before the string is submitted to any filters for verification. After this operation, the text size may change, which may have negative consequences, but more on that later.

In terms of theory, that’s all, I haven’t said much yet, but I hope I haven’t missed anything important. Unicode is incredibly vast, complex, thick books are published on it, and it is very difficult to concisely, accessiblely and fully explain the basics of such a cumbersome standard. In any case, for a deeper understanding you should check out the side links. So, when the picture with Unicode has become more or less clear, we can move on.

Visual illusion

You've probably heard about IP/ARP/DNS spoofing and have a good idea of what it is. But there is also the so-called “visual spoofing” - this is the same old method that phishers actively use to deceive victims. In such cases, the use of similar letters is used, such as “o” and “0”, “5” and “s”. This is the most common and simplest option, and it is easier to notice. An example is the 2000 phishing attack on PayPal, which was even mentioned on the pages of www.unicode.org. However, this has little relevance to our Unicode topic.

For more advanced guys, Unicode has appeared on the horizon, or more precisely, IDN, which is an abbreviation for “Internationalized Domain Names”. IDN allows the use of national alphabet characters in domain names. Domain name registrars position this as a convenient thing, they say, dial Domain name in your native language! However, this convenience is very questionable. Well, okay, marketing is not our topic. But imagine what a haven this is for phishers, SEO specialists, cybersquatters and other evil spirits. I'm talking about an effect called IDN spoofing. This attack belongs to the category of visual spoofing; in English literature it is also called a “homograph attack,” that is, attacks using homographs (words that are identical in spelling).

Yes, when typing letters, no one will make a mistake and will not type a deliberately false domain. But most often, users click on links. If you want to be convinced of the effectiveness and simplicity of the attack, then look at the picture.

IDNA2003 was invented as a kind of panacea, but already this year, 2010, IDNA2008 came into force. The new protocol was supposed to solve many of the problems of the young IDNA2003, but it introduced new opportunities for spoofing attacks. Compatibility issues arise again - in some cases, the same address in different browsers can lead to different servers. The point is that Punycode can be converted in different ways for different browsers- everything will depend on which standard specifications are supported.
The problem of visual deception does not end there. Unicode also comes to the service of spammers. We are talking about spam filters - the original letters are sent by spammers through a Unicode obfuscator, which looks for similar characters from different national alphabets using the so-called UC-Simlist (“Unicode Similarity List”, a list of similar Unicode characters). That's all! The antispam filter gives up and can no longer recognize something meaningful in such a mess of characters, but the user is quite capable of reading the text. I don’t deny that a solution to this problem has been found, but spammers have the upper hand. Well, and one more thing from the same series of attacks. Are you sure that you are opening something? text file, and not dealing with binary?

In the figure, as you can see, we have a file called evilexe. txt. But this is false! The file is actually called eviltxt.exe. What kind of crap is this in parentheses, you ask? And this is U+202E or RIGHT-TO-LEFT OVERRIDE, the so-called Bidi (from the word bidirectional) - a Unicode algorithm to support languages such as Arabic, Hebrew and others. The latter have writing from right to left. After inserting the Unicode character RLO, everything that comes after the RLO we will see in reverse order. As an example this method from real life I can cite a spoofing attack in Mozilla Firfox - cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-3376.

Bypassing filters - stage No. 1

Today it is already known that long forms (non-shortest form) of UTF-8 cannot be processed, as this is a potential vulnerability. However, PHP developers cannot be convinced by this. Let's figure out what this bug is. Perhaps you remember about incorrect filtering and utf8_decode(). This is the case we will consider in more detail. So we have this PHP code: