bell notificationshomepageloginNewPostedit profile

Topic : Re: Is there an objective way to determine that writing is "stilted and cluttered"? For question, see title. For example of supposedly "stilted and cluttered" writing, see below. The Constitutional - selfpublishingguru.com

10% popularity

We'll call stilt here any disorder in a text making it difficult to read. Clutter would be unnecessary redundancy. There is a long history in mathematics of analysing this.
Here is one way to measure the stilt in some writing. Also we easily find how much stilt there is. We do not merely ask whether it's present or absent.
Call the text t. Now consider each sentence. How many simpler sentences can it be broken into without a loss of meaning? Call this number S. If it cannot be broken further, S is simply one: that sentence. You take the logarithm of that. (By the way, the logarithm of one is zero.)
Do this for all sentences in the paragraph. You add these results. This sum you divide by the number of sentences. The end of this process is s(t). This is a measure of stilt in your text. A greater s corresponds to more stilt present in your writing. Null s(t) means there is no stilt in t. It's as you would imagine.
Why is that? We merely calculated the entropy in your text. This was done in the accepted way. That is, we measured the disorder we found there.
For example, this answer has no stilt. Your paragraphs have approximately 1.38 stilt in contrast with it. Your stilt would only be around 0.35 if half of your text was half as ordered as it could be without communicating less information.
Do notice that stilt does not covary with the length of the text. Measure this in words or number of sentences, as you like. This is as it should be, because meaning does not depend on word count in sentences. Nor does it depend on their number.
This greater clarity has been historically an acknowledged measure of literary quality. Such is the case with French, Chinese, and Japanese, for example.
All this measures disorder within each sentence. It does not measure disorder within paragraphs or pages.
A whole is not merely the sum of its parts. It is these parts plus their organization. So it's more than the sum of its parts if they are organized nontrivially. A system is a whole whose organization is not trivial. Texts are systems. The order of words in texts is meaningful.
Consider the paragraphs of the text instead of the sentences in it. You compute the disorder in the same way. We do this also for sections if the text has them. And we do this for whole texts, for example, the chapters in a book.
The smallest number we get is the lower bound of the stilt. Of course, the largest number we get is the ... upper bound. This is if the text is considered nonlocally.
Again, for example, this whole essay has null stilt. Its lower and upper bound are the same: zero.
This is quite intentional. It's for the purpose of illustration.
The following is assumed. The order of the words in the text corresponds to a common syntactical convention. The words themselves are selected from a common language that realizes this convention. If this is true the analysis is valid.

Flesch-Kincaid measurement (FK) in contrast does not measure clarity. It arbitrarily negatively weighs syllables and the number of words. Then it weighs these weights and subtracts an arbitrary constant. These parameters are chosen to give a hundred to the recent writings of ... schoolchildren ... This because there are many of them apparently.
But what about the content of prose, its meaning? You need as many words as you need communicate what it is that you mean. Meaning of a text does not depend on the number of words in it.
FK ignores the meaning one intends to communicate. It measures the distance of prose from something like this: "I eat bread at home. Do you know what? It tastes good I think. I like John Adams too. He too ate some bread once. He said it was not bad." This measures at only 111. Adams you see has two syllables. The score increases to 114 if we lose Adams and keep John. If we also specify what kind of bread it was and what we liked about it, the score decreases to 100. Yet it's not clearer or better to write "Abe Lincoln speaks" (FK 91) instead of "Abraham Lincoln speaks" (FK 35) or "Abraham Lincoln speaks tomorrow" (FK 13). FK scores don't measure clarity. Greater FK scores do not correspond to clearer prose.
At the same time FK scores are tedious, more complicated to compute. Textual entropy you can approximately compute with ease in your mind. You can do it during writing. But that is by the way. We have computers. All the same this is the reply to anyone who says the more rigorous method is true but too complicated and not practical. It's more practical in fact.
More significant is why are schoolchildren considered paragons of clear writing? For it isn't true. Most of them don't write much and read less. They have not much experience doing anything. In the average they are not taught to clearly communicate ideas. And they typically have few ideas to communicate. Most of them write superficially and this they do inaccurately, inexactly, vaguely, not clearly.
We write presumably to communicate meaning. This is the content of our writing. So why ought we write prose with less content to increase an arbitrary statistic which has only historical relevance? Why should writing that accurately and exactly delivers more content be defined per se as stilted and cluttered for these same reasons?

A perfectly ordered text, one with no stilt, can be unclear all the same. It is unclear if parts of it are meaningless. Perfect clarity requires their removal.
How is this possible? Some redundancy is required to eliminate uncertainty. This means it's informative. (This in the usage of Boltzmann, Shannon, Weaver, and Hinchin.) It has meaning. We fail to communicate it if we remove this redundancy. We cannot gain clarity doing this. And we have less meaning to communicate if lack this in our original text. Yet all redundancy other than this fails to communicate meaning.
Leibnitz said it first. Counting a coin twice does not mean you have any more money than if you counted it once. Zermelo put it another way. A set is defined by its elements. Same elements in the same organization mean the same thing. Repeating them does not communicate any distinguishable meaning.
You can consider each sentence. Then you count how many fewer words a sentence that communicates the same meaning can have. You take the logarithm of that. Do you see where I am going with this? So you compute the clutter. It measures the meaninglessness present in the text. Consider paragraphs and words, sections and words, and chapters and words analogously. You rewrite to reduce clutter.
Somerset Maugham advocated doing some of this, in the Summing Up. But removing all clutter is usually not worth the effort.
An analysis and reduction of clutter increases clarity. But it costs ( 2 ^ ( # words affected ) - # words affected ) times more labor than an analysis and reduction of stilt which equally increases clarity. The labor you need to analyze and reduce stilt covaries less than proportionally with the number of words in your text. This isn't true regarding clutter.
Few if any texts have strictly null clutter, for this reason. This text does not have null clutter.
EDIT:
ddm replies to the above. He says:
"This type of thing is exactly why I dropped my minor in mathematics. Flesch-Kincaid is NOT "arbitrary." It was developed based upon a very large data set... In contrast, your method is mathematically beautiful and precise, but entirely unworkable in practice, except for very short passages. Plus, it basically boils down to "use short sentences," which is EXACTLY what Flesch-Kincaid measures (in addition to short words)."
I answer that the question asked for an objective theory. There is one. It is not FK. Is he seriously suggesting that "Abe Lincoln speaks in New York" (FK 102) is a clearer, preferable, somehow clearer writing than "Abraham Lincoln speaks in Alabama tomorrow" (FK 3)? That is a false assertion. It's also an absurdity to assert that average people cannot understand "Abraham Lincoln speaks in Alabama tomorrow" without studying at a university.
He tacitly says that consistency and truth does not matter, yet historical coincidence does. The large data set was historical data. That is contingent on the circumstances at a particular times and places. It's not what is true in general.
Equally "clear" can objectively refer only to similar syntax and semantics. Is he saying on a forum about writing that syntax and semantics does not matter in style, and that we best consult the children instead? That we should consider what they can do in a particular period of history? How is this objective and not arbitrary?
The Lincoln examples are equally clear. That they are not is a nonsensical result. A theory with some false predictions and some true ones is a false theory, no better than arbitrary guessing. What data set suggested the false prediction doesn't matter at this point.
And what about "my" method (I am Hinchin?) unworkable? Since log 1 = 0, log 2 ~ 0.7, log 3 ~ 1.1, log 4 ~ 1.4, you can add these things mentally with ease. Unlike FK. Also unlike FK it's consistent. It corresponds to meaningful communication. The Lincoln examples are not spuriously distinguished.
It's invariant to the number of word and the number of syllable. These have nothing to do with meaning. It does not say shorter is better. FK does that. It rather asks, how can we reduce overall disorder while communicating the same ideas?
FK says we ditch the ideas and shorten the text. Which is easily done when the ideas are allowed to be thrown away.
Also most writing is not for children anyway. It's by adults writing to communicate with adults. Why must adults write as if they are children so that other adults can more fully understand them? That they must do this is his thesis. I say: Offer proof. Don't just refer to a "large data set".
The coordinates of the cattle in a field are a large data set too. Yes, the cattle are all composed of fundamental particles. But no fact about the fundamental particles follows from this data set. It's utterly irrelevant however large it is.
If somebody replies I don't appreciate just "how very large" it is, the reply is that they don't understand how very irrelevant it is also. No theory of the properties of particles whose argument is location of cattle can avoid false predictions. So it isn't true.


Load Full (0)

Login to follow topic

More posts by @Shakeerah107

0 Comments

Sorted by latest first Latest Oldest Best

Back to top