News:

11 March 2016 - Forum Rules

Main Menu

Big O notation and in-engine text formatters

Started by Gideon Zhi, March 05, 2013, 03:19:37 PM

Previous topic - Next topic

Gideon Zhi

Note, heavy theoretical computer science stuff follows. This thread isn't so much about the act of programming as it is about the efficiency of an algorithm. You've been warned.

So I've started implementing automatic text formatters in my recent projects. I intend to continue to do so, as it makes things a lot easier on the insertion end of things! But as an exercise for myself, just for fun, I've been been trying to figure out the efficiency of the algorithm. It looks something like this, in pseudocode:


int linepos=1
int currentline=1
int maxlinelength=(arbitrary per game)
int maxlines=3 (usually)
for(start of string, end of string, index++){
  val = string[index]
  if (val== space){
    int seekIndex=index++
    int seekLinePos=linepos
    for(seekIndex, end of string, seekIndex++){
      if(string[seekIndex==control code)
        seekIndex+=1+params
      else if(string[seekIndex]==space OR linebreak OR sectbreak OR EOS])
      {
        if (seekLinePos>maxlinelength)
        {
          if (currentLine==maxlines) val=sectbreak
          else val=linebreak, currentline++
        }
        break
      }
      else seekLinePos++
    }
  linepos++
  print(val)
}


Roughly, anyway. Basically each time it loads a character, it checks to see if it's a space. If it is, it checks to see if the next word will fit on the line, filtering out non-printing control codes. (For the sake of simplicity, we're going to assume no variables, dictionary lookup commands, or anything else that would actually print text into the window.) If the word fits, great! If not, the routine replaces the space with a linebreak or section break as appropriate.

Now, when figuring out runtime efficiency, you usually look at loops. Usually when you have a loop inside a loop accessing the same data object, you start thinking polynomial time, which is a really bad thing. But that's not what's going on here. Let's look at a couple of cases...

Case: String a space at the start of the string, but nowhere else.
In this scenario, the game loops through the string exactly twice - once when it sees the space at the start of the string, and once to load the rest of it. Efficiency is O(2n).

Case: String consists of nothing BUT spaces.
In this scenario, the game loops through the entire string once, but the inner loop only iterates a single time per load. Once again, O(2n).

This is where I start getting stuck. The hypothetical worst case scenario has to look at maximizing the number of times the inner loop runs. The two edge cases are both O(2n), but the efficiency obviously gets worse with more complex strings; they must get less efficient and converge towards some middle point, determined based on the number of spaces in the string and the average word length. I'm stuck trying to figure out the relationship there.

Anyone have any thoughts on this matter?

DougRPG

Hi, I did something similar some days ago. I created a tool to help me translate the text.
My script is in a Xml format, so this tool is a QT program that reads this Xml and display the dialogs, and I can navigate between the dialogs.
Then I put a field that tells me if the current dialog has more letters than the allowed, considering control codes, line breaks, etc. And this field show the text in red and green colors.
See this picture:



I put my control codes inside | |.

I tried to do this by hand, but the final code became very ugly. Take a look:

void MainWindow::verifyAndInsertDialog(QString dialog)
{
    QString totalLine = "";
    QString partialLineWithoutCodes = "";
    QString partialLineWithCodes = "";
    int lineNeedle=0;
    int blockWriteNeedle=0;
    unsigned int currentBlocksize=0;

    while (lineNeedle<dialog.size())
    {
        bool charIsCode=false;
        unsigned char currentChar;
        unsigned char simbol;

        currentChar = dialog[lineNeedle++].toLatin1();

        if ((lineNeedle == dialog.size()) && (currentChar != '|'))
        {
            partialLineWithoutCodes.append(currentChar);
            currentBlocksize++;

            partialLineWithCodes = partialLineWithoutCodes;

            if (currentBlocksize<33)
            {
                partialLineWithCodes.prepend("<font color=\"green\">");
                partialLineWithCodes.append("</font>");
            }
            else
            {
                partialLineWithCodes.prepend("<font color=\"red\">");
                partialLineWithCodes.append("</font>");
            }
            totalLine.append(partialLineWithCodes);
            break;
        }

        if (currentChar == '|')
        {
            partialLineWithCodes = partialLineWithoutCodes;

            if ((dialog.size()-lineNeedle) < 3)
            {
                if (currentBlocksize<33)
                {
                    partialLineWithCodes.prepend("<font color=\"green\">");
                    partialLineWithCodes.append("</font>");
                    partialLineWithCodes.append(currentChar);
                }
                else
                {
                    partialLineWithCodes.prepend("<font color=\"red\">");
                    partialLineWithCodes.append("</font>");
                    partialLineWithCodes.append(currentChar);
                }
                totalLine.append(partialLineWithCodes);
                break;
            }


            QChar simbol1 = dialog[lineNeedle];
            QChar simbol2 = dialog[lineNeedle+1];
            QString code = tr("%1%2").arg(simbol1).arg(simbol2);

            charIsCode = true;

            simbol = code.toInt(0, 16);

            if ((simbol == 0x5C) || (simbol == 0x5D) || (simbol == 0x63) || (simbol == 0x67))
            {
                if (currentBlocksize<33)
                {
                    partialLineWithCodes.prepend("<font color=\"green\">");
                    partialLineWithCodes.append("</font>");
                }
                else
                {
                    partialLineWithCodes.prepend("<font color=\"red\">");
                    partialLineWithCodes.append("</font>");
                }
            }
            else
            {
                if ((simbol == 0x5F) || (simbol == 0x60) || (simbol == 0x64))
                {
                    if (simbol == 0x5F) currentBlocksize+=4;
                    else if (simbol == 0x60) currentBlocksize+=7;
                    else if (simbol == 0x64) currentBlocksize+=12;

                    partialLineWithoutCodes.append(currentChar);
                    partialLineWithoutCodes.append(simbol1);
                    partialLineWithoutCodes.append(simbol2);
                    partialLineWithoutCodes.append('|');
                    lineNeedle = lineNeedle+3;
                    continue;
                }
            }

            partialLineWithCodes.append(currentChar);
            partialLineWithCodes.append(simbol1);
            partialLineWithCodes.append(simbol2);
            partialLineWithCodes.append('|');

            lineNeedle = lineNeedle+3;
            currentBlocksize=0;
            totalLine.append(partialLineWithCodes);
            partialLineWithCodes.clear();
            partialLineWithoutCodes.clear();
        }
        else
        {
            partialLineWithoutCodes.append(currentChar);
            currentBlocksize++;
        }
    }

    ui->TextValidate->setText(totalLine);
}


The complexity is O(n), because I read each char only once. But the code is ugly.
Maybe a more elegant solution is to use a formal parser. Or break the string in several parts using some regex.

I don't handle spaces like you, only control codes.

FAST6191

I have pondered such things a lot in times past though I would probably still opt for PC site formatting and/or pointer recalc given the chance.

Likewise I am probably more inclined to go game by game in every sense- if I can avoid doing a full push of every register as per your average C function call I will.

Still I would be more inclined to look at something like fixed width systems if you can spare the ROM space (though I imagine the overhead is not a great deal) and do something like reverse parse (your nice 32 byte section is probably not going to end after 3 bytes all the time but rather high 20 something). If you are adding your own parser then you could probably bookend it/create a magic stamp after a fashion for added bonuses, though that is probably just my being sorely tempted to do a bit of light PC/editor side parsing.

Equally depending upon what your base system is and what graphics method the text is using it might be possible to set an interrupt based on obj/OAM or bg memory values rather than the per character if else routine. There might be failings if a word is very long but checking for that would probably still be far quicker than you are presently facing and I imagine it would do better if you are wanting variables, dictionary words and such like.

If abusing graphics subsystems is not an option I would definitely take a counter approach as well, indeed I would probably take a logically flawed one (one that newlines even if it does not strictly need to as the next word is but two characters*) over a per character check.

*if you have played with some of the less accurate sound counters for sequenced music (see SSEQ from http://kiwi.ds.googlepages.com/sdat.html ) then kind of like those.

I shall have to gather my thoughts on the matter as they are not especially coherent right now.

tryphon

First, O(n) = O(2n) = O(kn). And unless I'm really tired, it's clear your algorithm is O(n). If you are looking for the best constant, you showed you can't go below 2, and it's attained, so it's 2.

If you're looking for an 'average' constant, it has no really meaning, unless you know the statiscal distribution of words length.

Gideon Zhi

Quote from: tryphon on March 05, 2013, 05:34:33 PM
First, O(n) = O(2n) = O(kn). And unless I'm really tired, it's clear your algorithm is O(n). If you are looking for the best constant, you showed you can't go below 2, and it's attained, so it's 2.

I understand that O(n)=O(kn), but the inner loop doesn't iterate a constant number of times. I THINK shown that there's no way that the inner loop will iterate n times, but there's still no guarantee that it won't iterate log(n) times. I'm looking for that upper limit.

QuoteIf you're looking for an 'average' constant, it has no really meaning, unless you know the statiscal distribution of words length.

I'm looking for a worst-case.

DougRPG

Seems that there is no worst case than O(2n). If the inner loop stops when it finds a space, just after that the outer loop goes to this space.
So I think there is no way some char can be read more than two times.

Gideon Zhi

Quote from: Dgdiniz on March 05, 2013, 06:12:07 PM
Seems that there is no worst case than O(2n). If the inner loop stops when it finds a space, just after that the outer loop goes to this space.
So I think there is no way some char can be read more than two times.

That... actually makes sense. Damn.

See, this is my problem. I always look for the most complicated explanation when there's an easy one staring me in the face. Cheers!

Garoth Moulinoski

#7
Big O notation is always the upper bound, iirc. I'll check my book in a bit, if I can find it.

Edit:

QuoteBig Oh = T(N) is O(F(N)) if there are positive constants c and N0 such that T(N) <= cF(N) when N >= N0.

That really doesn't help much*. However, to contrast Big-Oh: "Big-Omega [...] is the lower argument". It's not used commonly except in "more advanced analysis".

source: Data Structures & Problem Solving Using- Java Fourth Edition, Mark Allen Weiss

*The author was a professor at my university while I was still studying there. He was always hailed as a living compiler, because he could "compile code in his head". I never spoke to him in person.
Who will quote me next?
Disclaimer: If it sounds wrong, I may have been posting while asleep.

henke37

Here is my approach:

var word:String="";
var line:String="";
for(var i:uint=0;i<str.len;++i) {
var c:char=str[i];
if(c.isWhiteSpace) {
if(wordFitsOnLine(word)) {
line+=" "+word;
} else {
line=word;
emitLine(line);
}
word=""
} else {
word+=c;
}
}


Should run in linear time.