[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/jp/ - Otaku Culture

Search:


View post   

>> No.8450209,145 [INTERNAL]  [View]

Files added:
Files changed:
118/160 433/668 64.8%

I hope this doesn't get stuck at 2/3 again.

>> No.8598688,161 [INTERNAL]  [View]

>>8598688,160
There are different language versions of Windows.

http://www.neowin.net/forum/topic/809590-windows-7-localized-iso-images-are-now-on-connect/

>> No.8735414,1 [INTERNAL]  [View]

The SYN flood got to him.

>> No.8598688,159 [INTERNAL]  [View]

>And no, dealing with additional bytes in UTF-8 doesn't bring about any more algorithmic complexity, it brings about a few more iterations in a loop.
"algorithmic complexity" isn't really important in practice, because the constants do matter. I think I mentioned this before in one of the previous posts here. Real efficiency isn't measured in terms of big O and all that. It's measured in how much time and space something takes.

>If internationalization offends you so deeply
It's not "internationalization", it's adding useless bloat a monolingual or bilingual user will never need. Providing separate-language versions (like what Microsoft does with their different language versions of Windows) is.

>Gutting and rewriting a working program for every possible local encoding it might be used in
There is not much that needs to be changed. The majority of the Latin-based languages have very similar encodings that aren't much more than "different glyphs for the upper 128 characters". Asian stuff is a little more complex, but not really difficult. Also you are assuming text processing is the majority of code in the most programs, which isn't the case.

>but nobody has time for this kind of thing.
Then what the bloody hell are you doing with your time? Waiting for slow, inefficient software to finish?

>>8598688,158
You're just frustrated you can't come up with any better arguments so you try to retort with insults. I know what I'm talking about, and I'm weighing the benefits against the drawbacks, which is apparently something you are unable to do --- all you can do is be brainwashed by the biased promoters of whatever it is, accepting their claims of "it's better" without question.

>> No.8450209,144 [INTERNAL]  [View]

Files added:
Files changed:
118/160 433/668 64.8%

I don't want to look at the ETA estimate.

>> No.8450209,143 [INTERNAL]  [View]

Files added:
Files changed:
118/160 433/668 64.8%

>>8450209,142
What? What '"head" joke' ?

>> No.8598688,156 [INTERNAL]  [View]

>>8598688,155
The future of waste, hedonism, and idiocy.

>> No.8450209,141 [INTERNAL]  [View]

Files added:
Files changed:
118/160 433/668 64.8%

>>8450209,139
Don't think. Translate and you'll be TRANASINN.

>>8450209,140
...lol wut.

>> No.8598688,154 [INTERNAL]  [View]

>UTF-16 has invalid ranges as well.
But they are very small -- mismatched surrogate pairs. In comparison...
>The probability of a random string of bytes which is not pure ASCII being valid UTF-8 is 3.9% for a two-byte sequence

>I don't know what you mean by this. 4 potential codepoint lengths? It wouldn't matter if there were 50, the logic would still be the same; read the length upfront and aggregate the rest. Having more segments doesn't imply the programming logic becomes anymore complicated.
Of course it becomes more complicated. There are a bunch more error cases to handle, and potentially more bitshifting too. Write some code to extract the next codepoint from a UTF-8 string given a pointer, and do the same for UTF-16. Compare. If you're still not convinced, compare the number of machine instructions and benchmark them.

>Windows MUI packs aren't forced upon anyone.
You seem to have not noticed my other example, Linux distros which do this. I'd rather have more software in my language than a bunch of other languages I will never use.
>Doesn't have much to do with Unicode, though.
No, you're not thinking enough. This is an example of how trying to do everything for everyone, by including every single language, is not optimal for anyone.

>with several hundred codepages to accomodate by hand and no overarching encoding standard, we'll have a lot of broken software, for all the reasons given above
Bad programmers are going to be bad programmers no matter what. The same applies to good programmers.

>Those sentences you quoted don't contradict each other.
I'm not saying they contradict. I'm saying that you seem to believe that English should be the standard, despite this being a discussion about the goal of Unicode which you have mentioned earlier about accommodating all languages.

>> No.8598688,152 [INTERNAL]  [View]

>>8598688,149
>>8598688,150
PDF is actually not that bad and more efficient than PS and maybe even plain text because it has compression by default. (Of course, it's been getting more "features" added over time, like everything else seems to be these days.)

>Moreover where do you encounter novels in plaintext outside of shady warez sites? Publishers and authors generally use some kind of typesetting system. That's a contrived example and it's far from the general use.
From the beginning I was talking about the text and only the text, which should make up the majority of a contentful site. You are confounding the point by bringing markup into the question.

>I think we can agree it applies most as they're going over the wire
Yes. However, that is where compression is used, and not in memory.

>UTF-8 involves reading the the length out of the leading byte and then appending the remaining codepoint data from the continuation bytes.
...and checking each one to ensure it is valid.
>UTF-16 has its ugly system of surrogate pairs and variable byte-order to pay attention to.
"ugly" is subjective. I think it is much simpler because there is only two cases: either you're in the BMP or you're not. With UTF-8 there are a bunch of ranges and total of (at least) 4 cases.

>A handful of gettext templates isn't a "huge waste" and most people won't be cognizant a few KB of optional install files is being "forced upon" them
That's a "best case" situation. I had in mind the Windows language packs (http://www.pcdiy.com/22/windows-7-language-packs-rtm-mui-download - look at the sizes) and certain Linux distros which assume you are very multilingual.

>It isn't about users. It's about programmers.
What do you mean it isn't about the users. Who are the programmers writing code for? Who are there more of? Fuck the programmers.

>different byte widths
Since when did we decide a byte may not be 8 bits again?

>English is the Lingua Franca of technology, most people are past this.
>but in practice the one-size-fits-all solution (better known as a "standard") works a great deal better.
You did say "It isn't about the users. It's about the programmers." Then why not just standardise on English? That would certainly make things a lot easier for the programmers! [If anything I think Chinese would be the language to standardise on due to its extreme information density as well as the large userbase.]

>> No.8450209,138 [INTERNAL]  [View]

Files added:
mh05_341.asb
Files changed:
118/160 433/668 64.8%

You're going to have to do that huge one sooner or later...

>> No.8598688,147 [INTERNAL]  [View]

>>8598688,144
>I was discussing the likes of HTML, IETF email protocol, ODF, OOXML, Tex, DocBook and so forth, not human language writing systems
...which were all designed by an English-speaking community.

>The common wisdom that UTF-8 is 50% larger than UTF-16 for Asiatic text doesn't hold, and the difference is embellished.
That is because, technically speaking, not 100% of that text was Asiatic. If you take something like a Japanese novel's text -- and ONLY the text -- you will likely find the 50% increase because there will be almost NO characters other than Japanese ones.

>in practical use, semantic or presentational markup takes up a fair amount of space
Unfortunately true. However a page with more emphasis on content will naturally have a lower markup:content ratio.

>Hence why you had to strip the HTML from the page to make UTF-16 not look positively terrible by comparison.
A good browser design does not keep the markup, because it extracts the content into its own data structures when it parses. The last time I checked, even a byte was enough to uniquely identify every element in HTML5 (and possibly all the attributes, although I seem to remember it being only slightly over 256.)

>An encoding can't be fixed-width "most of the time", it's a yes or no proposition.
I didn't say it's fixed-width. Either 1 or 2 shorts (2 or 4 bytes). Thus, retrieving a character from a buffer is going to be simpler in UTF-16 than UTF-8. (I can almost write the code for it just from memory---if the first short is the right surrogate, read the second. Otherwise the first one is enough.) In contrast the UTF-8 algorithm is more complex. Since this consumption of characters is going to be one of the most commonly performed operations in processing text, it makes sense for it to be as simple as possible. On the other hand it needs to be balanced with size too --- you can think of real compression algorithms on one end of the scale, and plain UTF-32 at the other.

>Even UTF-32 can't be treated that way; it doesn't account for the codepoints' combining characters.
...let's not get into that.

>every worthwhile language provides libraries or built-in support for all the encodings and most of the people who feel a need to "implement" any of the encodings despite this are precisely the sort that shouldn't be.
Yes, I'm well aware of the huge amounts of code that has been written for this sort of stuff, and this is the complexity most people don't see although they should, because hidden complexity has real implications.

>Often two or three, sometimes more.
Then you must be very multilingual. However, although I am not able to find any studies on this, I would hypothesise that the majority of webpages contain only one language.

>Unicode is wonderful for data interchange, but it's central domain is application portability and obviating the need to rewrite programs to work for speakers of other languages, and no, it is not as simple as simply pointing to a different code page.
A one-size-fits-all solution is never optimal, and in this case of multiple languages there is going to be very little overlap between e.g. users of African languages and those of Asian ones. In addition, with the exception of areas like translation, almost everyone uses applications in only one language --- thus e.g. it is a huge waste to force upon those who will only ever use the English version of a program the requirement to download all the other languages.

>>8598688,146
While sacrificing many other things which are also important, as I have pointed out.

>> No.8450209,137 [INTERNAL]  [View]

Files added:
Files changed:
117/160 433/668 64.8%

>>8450209,136
>Sakura
>him
Ok.

>> No.8598688,140 [INTERNAL]  [View]

>>8598688,136
>Basically all major human-readable text formats center around the ASCII range
That's only for the English and other Latin-based languages. You are neglecting to consider the very sizeable Asian population.

Also, a single webpage makes a very poor example of "proof". I took the Japanese Wikipedia page for Tokyo, cutting out everything except the Japanese text (i.e. the sidebars, the footers, the tables of numbers), and used the following encodings, arriving at the following sizes:

UTF-8: 68295
UTF-16: 51298
Shift-JIS: 46955

The UTF-16 is about 9% larger than Shift-JIS, but UTF-8 is 45% larger, which is what I expected.

As another example, let's pick a random Japanese site... why not 2ch, and pick a random board from it. I chose http://toro.2ch.net/tax/

UTF-16: 64334
UTF-8: 60754
Shift-JIS: 46316

Here UTF-8 and UTF-16 trade places because the page contained a ton of whitespace and URLs, but both are still much larger than "native" Shift-JIS: 31% and 39%, respectively. Note that in something like a browser the whitespace would never be represented explicitly, with only the text strings within them being stored. The same goes for all the markup.

>UTF-8 isn't any more complex UTF-16, either.
UTF-8 can be 1, 2, 3, or 4 bytes per character. UTF-16 is either 2 or 4 (and most of the time will be 2 only.)

>Representing all the world's languages under a single encoding is a necessarily complicated task, so it's sort of pointless to angst over how convoluted it is.
The deeper question is whether it really is necessary to do that. How many languages do you usually see at once? I mean in normal documents, not extreme cases like Unicode test pages. Yes, Unicode is "good" in the sense that you can use every language that has ever existed and may exist in the future simultaneously, but has there really ever been a huge need to do that?

>>8598688,139
We have somehow arrived at the convention of an 8-bit byte ("octet"), and it is convenient/useful in some ways, but the waste is not much in comparison to other encodings like UTF-16/24/32. Some pre-ASCII systems used 6 bit codes which were enough for a-z, A-Z, 0-9, and two more characters, clearly not enough for even English. 7 bits is officially ASCII, and is sufficient for English, but I think they settled on 8 because it was a power of 2. The upper 128 characters can be used for other Latin-based languages or graphics. Thus I do not think 8 bits is all that inefficient.

>> No.8450209,135 [INTERNAL]  [View]

Files added:
Files changed:
117/160 433/668 64.8%

>>8450209,124
How did you know.

>> No.8450209,134 [INTERNAL]  [View]

Files added:
mh05_321.asb
Files changed:
117/160 433/668 64.8%

>>8450209,133
Ok. Due to its popularity I thought you were just someone else with the same name.

>> No.8598688,134 [INTERNAL]  [View]

>>8598688,132
I left UTF-8 out because it's so prevalent that I'm sure you all know the advantages and disadvantages already. In case you don't: it's good for English but 50% bigger for most of CJKV compared to e.g. Shift-JIS, and also even more complex to handle correctly than UTF-16.

I'll ignore your obvious pathetic attempt at an insult, but you should think about where your food comes from; and by that, I don't mean the supermarket.

>>8598688,133
I do not care what "the point of Unicode" is just as I do not care what "the point of web applications" is. The only matter of practical importance is how it is actually implemented and used.

>Playing captain obvious and dropping random facts that make no sense
No, I am encouraging you to think deeper, because that is clearly something you have great mental difficulty with. All you can see is the "surface", the propaganda that makes you think everything is great, and you cannot see the huge costs and tradeoffs hidden beneath.
>"Blue is better than red because the sky is blue!" is something you would say.
Wrong. We were not talking about properties of the atmosphere.

>> No.8705515,2 [INTERNAL]  [View]

>>8705382
>Please save this post as an image and use it whenever someone tries to spambomb something off the board.
Now that's proactive.

>> No.8598688,131 [INTERNAL]  [View]

>>8598688,125
I don't care for the suckless guys since they're against dynamic linking (which is A GOOD THING -if- implemented properly. Win32 was heading in that direction before they totally screwed it up with the SxS shit.) cat-v also has that view but at least they've got 9front (aka CirnOS).

>>8598688,126
RWI-AJAX was driven more by user demand. The original RWI would've stayed in use otherwise.

>>8598688,129
>>8598688,130
I know you're being facetious but Unicode has its problems in implementation. Either you use 16 bits UTF-16 and waste 50% of it for English text (not so bad with CJKV) and need special processing for the points >64K, 24 bits (UTF-24? Sufficient for the whole range) and waste 66% of it for English text, 33% for the majority of CJKV, and cause some complexity handling the oddness, or 32 bits and waste 75% of it for English text, 50% for the majority of CJKV, and over 25% for everything else, despite it being simpler to process! None of these choices are particularly good. Unicode is efficient only if you use nearly all of it, while the majority of English sites are going to use 7 bits out of 21 --- 0.006% of the entire Unicode space, with the other 99.994% just wasted.

>> No.8450209,132 [INTERNAL]  [View]

Files added:
Files changed:
116/160 432/668 64.7%

>>8450209,131
I retract my previous statement, having received from him notice that he has reached >70%.

>> No.8694591,1 [INTERNAL]  [View]

FYI I'm mixed.

>> No.8598688,124 [INTERNAL]  [View]

>>8598688,123
>Then check how pages are rendered. Parsing CSS rules might be a one time thing, but rendering heavy shadows/transforms/transitions is a heavy operation which can potentially be done with every repaint event.
I can see this happening for animations, but whatever fancy effects there are, they shouldn't need to be rendered more than once if they're static. [This is why JS+CSS can be very resource-intensive --- combine JS modifying the DOM many times, and complex CSS rules, so every time a change is made to an element it has to propagate through everything and induce a cascade of rendering again.]

>between using a web based mail client and a desktop based one, there won't be any measurable difference in terms of power consumption.
Depends which specific client you're talking about. I've seen horrible inefficiency in traditional applications as well as in-browser stuff. [Or do you mean there SHOULDN'T BE any difference - and I would hope so - but that's not the reality of it.]

>Fuuka's quote previews are waste of autonomy, but the RETrans javascript is absolutely perfect and indispensable.
I never said that. I NEVER SAID THAT. BUT you are comparing two things that solve very different sets of problems. I will argue that the former is NOT a problem because we already have #anchors - but for the latter (I assume you mean RWI?)... figure out how to do it without JS and needing to submit everything, and I will probably get that in the next release.

>So, please explain me how just-in-time javascript compilers suck and how it's all part of the ``newer is better'' illuminati?
I never said that. I NEVER SAID THAT.
.
But they are just attempting to find solutions to problems that should not have ever become problems in the first place. Small and efficient JS interpreters would have been more than sufficient if sites did not start using excessive amounts of it and piling layers and layers of abstraction on top.

The more complexity there is, the more bugs there are likely to be. [Just look at the bug report lists for all the major browsers, especially the open-source ones.] I will never trade performance for correctness.
.

>> No.8450209,131 [INTERNAL]  [View]

Files added:
Files changed:
116/160 432/668 64.7%

>>8450209,124
Based on overall rate since the beginning.

>>8450209,126
I remember the translator didn't have a tripcode.

>>8450209,127
He has his own project to work on [which is progressing at about the same rate if not slower.]

>> No.8598688,121 [INTERNAL]  [View]

>>8598688,116 -> 8598688,117
Also I've posted it many times before, go find it if you really need to contact me in private for some reason.

>Rendering. Not processing.
I'm referring to everything that takes place to get things done. Let's not turn this into an argument about semantics.

>Yes. Yes, and? Watching videos on youtube drains the battery faster than reading Stallman's black and white pages. And?
It's waste. There is a big difference between going to YouTube to deliberately do something you know is going to consume more power, but is something you actually want because you are getting a positive result from it (i.e. watch videos - why else would you go to a site whose sole purpose is that), and some random JS on a page doing something you neither want nor need (and in fact may HATE.)

>The point is, nowadays, javascript engines produce, more or less optimized, native code.
Wasted resources are wasted, regardless of whether they're being "wasted efficiently".

>There's no "autonomy leakage", "my fans are moving on their own" any other of your delirious nonsense.
If you said that in front of me here I'd say "let's take your laptop to the lab, hook it up to the instrumentation and see what sort of data we get as we do various things with it". I haven't bothered to count how many links to repeatable scientific studies that others have performed, which support my hypotheses as well as personal experience, yet you have not shown even one example of research that supports your side. All you can do is deny reality and say "it doesn't happen", when someone else can easily shoot holes right through your argument by digging up the research. Even the global warming deniers at least attempt to make up "research", but you're not even trying. How very unconvincing.

>So Cudder, does you ghetto browser have a JITed javascript engine?
Wasted resources are wasted, regardless of whether they're being "wasted efficiently".

>>8598688,119
>>8598688,120
I actually use vi for heavy editing, and notepad is fine too, but I will never use emacs - mainly because the command key combinations are totally unintuitive and make for some awkward fingering.

Navigation
View posts[-48][-24][+24][+48][+96]