2005/5/10

     
 

String

artefaktur

| References | Casting | Arrays | Import | instanceof | Package | Synchronization | Throwable | finally | Dangerous | Stack | String |

In Java the class acdk.lang.String is not a normal class, but support some special features. Most of them and some additionally are provided in ACDK.



Content of this chapter:

   Basic Operations
     Construction
     operator +
     Normal String
     Const String
     Sub String
   Dangerous situations
   Unicode and Encoding
   Different String Storage Types
     Handling char[] or wchar[] buffers



 Basic Operations


 Construction


In Java a String initialized from a literale be constructed in several ways:


// Java
String str = new String("text");
// alternative:
String str = "text";

This also applies to ACDK:

// ACDK
RString str = new String("text");
// alternative:
RString str = "text";

 operator +


Two String can concanated with the + operator:

RString h = "Hello ";
RString a = "ACDK";
RString str = h + a + ": it works!";
// str contains "Hello ACDK: it works!"

Note:
Because of implementation limits the + operator on strings may not so efficient. If it will be used very often on performance critical path you should may use StringBuffer to concat strings:

StringBuffer sb(1024); // reserve buffer 
sb.append("Hello ");
sb.append("ACDK");
sb.append(": it works!");
RString str = sb.toString();
StringBuffer (and some writers, like PrintCharWriter) support also streaming syntax:

StringBuffer sb; 
sb << "Hello " << "ACDK" << ": it works!";
RString str = sb.toString();

// a shorter version for using StringBuffer to create a string:
str = SBSTR("Hello " << "ACDK" << ": it works!");


ACDK enhancements Constructing a String in ACDK you can control the memory management of the internal buffer.

 Normal String

With

RString str = new String("Hallo", NormalSST | CCAscii);
a copy of the text will made.

 Const String



RString str = "Hallo";
// is equivalent to:
RString str = new String("Hallo");
// is equivalent to:
RString str = new String("Hallo", , ConstSST | CCAscii);

// is equivalent to:
RString str = RCS("Hallo"); // spell as Ressorce C String

The String does not copy the string "Hello" in this case, just refer to the original literale string. This of cource only works, if the string is a constant.

 Sub String


If an String operation returns a part of the original string, the resulting string internally only holds a pointer to the originally String object and offsets, which borders the substring.

 Dangerous situations


Following code will crash at runtime:

! RString read(int fd)
! {
!   char buffer[1024];
!   myfread(fd, buffer, 1024);
!   return new String(buffer);
! }

The String asumes the buffer is a literale text (which implies it is availalbe all the time) and not just a buffer on the stack which will be destroyed leaving read() method.

The right solution looks like:

RString read(int fd)
{
  char buffer[1024];
  myfread(fd, buffer, 1024);
  return new String(buffer, String::Normal);
  // or alternativally
  return SCS(buffer); // Stack C String
}

 Unicode and Encoding

!

Unicode support is introduced in ACDK first in the 4.x version.
ACDK String support different encodings:
  • US-ASCII with all standard english characters.
  • Ansi with code pages, for example LATIN-1
  • UCS-2 are 16bit unicode character, which are equal to Java and MS Windows String character set.
  • UTF-8 are UCS-2 character, but backward compatible to 8bit character.
    Character > 127 (non-ASCIIs) are encoded with multiple bytes.
  • Reading and writing to byte based stream, ACDK also supports the ISO and DOS code pages.
    Please refer to  String Encoding.
You can initialize standard string with normal ASCII literals, but not with 8bit char string containing extended characters.

RString s1 = "text"; // ASCII literal
RString s2 = "Ääh, dass geht nicht"; // !!! ERROR, in debug version throws UnmappableCharacterException
RString s3 = _US("w\\u00fcrde gehen"); // OK 
To use unicode in literals the unicode escape \\uxxxx can be used, where xxxx is a hexadecimal unicode number for the chacter.

Using a single unicode character:

uc2char ch = _UC("\\u00fc"); // NOTE use "" not '' for the _UC() macro

See also at http http://www.unicode.org/charts/.

!

Using the wide character L"my wide character string" is not an option for portable code, because it depends on the encoding of the source file, the compiler, the operation system and last but not least on environ setting.


// convert a unicode string to ascii.
// The third parameter points, that all unknown character (> 127) 
// should be simply
// dropped
RString s4 = s3->convert(CCAscii, ::acdk::locale::ReportCodingError, ::acdk::locale::IgnoreCodingError);

For more information please refer to  String Encoding.

 Different String Storage Types

To improve performance for Strings ACDK implemented different modi for String initialization:
  • Normal will copy the string into an internal buffer:
     
          RString str = SCS("From somewhere"); 
          // SCS(CharTypePtr) is more or less an alias for new String("From somewhere", NormalSST | CCAscii);
          char buffer[1204];
          getSomeThing(buffer, sizeof(buffer));
          RString s = SCS(buffer); // copy into own buffer;
         
  • Sub will reuse the storage of another string, where the new String will be a substring of the orinal.
    The reference count of the source string will incremented and hold by the substring to avoid dangling pointer.
    
            RString str = from_somewhere();
            RString substr = str->substr(2, 4);
          
  • Const constructs a String from a const string which may a literal, static buffer or otherwise outside the string managed buffer. In last case the programmer must be care, that the lifetime of buffer is not shorter than the String.
    
            // does not copy string, but just hold pointer to literal
            RString standardLiteralString = "Some Literal in C code";
            char buffer[1204];
            getSomeThing(buffer, sizeof(buffer));
            RString cosntstr = RCS(buffer); // force using buffer
            // take care, that RString doesn't live longer than buffer!
          
  • Hash Will insert the String text into a global ressource table with all Strings that are constructed as a hashed String.
    This is usefull for parsing engines to reduce memory overhead for strings which are occour very often.
    
            RString firstString = "ACDK";
            RString secondString = "ACDK";
            // firstString == secondString is false
            RString firstInernalizedString = firstString->intern();
            RString secondInernalizedString = secondString->intern();
            // firstInernalizedString == secondInernalizedString is true
            
          

 Handling char[] or wchar[] buffers


  RString text = "hallo"; // No problem
  char buffer[20];
  strcpyn(buffer, "hallo");
  RString text1 = buffer; // Oops, will not work!
                          // in debug version this will throw an exception
  RString text2 = new String(buffer); // will not work too
                                      // in debug version this will throw an exception
  RString text2 = new String(buffer, NormalSST | CCAscii); // is ok
  RString text2 = SCS(buffer); // is ok too

 < prevConstructs(12 / 12)