Code Snippets

Comparison of Audible, Phillips, & Rexton Hearing Aids

I can't find anything out there that compares the important functionality of hearing aids. So here's my inexpert opinion between these three.

I had Audible hearing aids for the last 5 years until they broke. So this is comparing to the Audible as they were 5 years ago. I then went to CostCo (best prices by far for hearing aids!) and got the Rexton. I tried them for 2 months and they sucked, so I then returned them (in Colorado you can return for a full refund withing 180 days of purchase) and got the Phillips.

Price: Phillips & Rexton are $1,800.00. The Audible you get from one of their approved audiologists and they tend to be $6,000.00

Quality of the sound: Phillips is the best. I can tell the difference from Audible, but barely. And Rexton is slightly worse than Audible.

Feedback: Phillips is the best, I really have to work at it waving my hand all over the place to sometimes (rarely) get feedback. Audible would regularly give feedback if I put my hands close but otherwise was usually ok (sometimes it would start doing it and I had to move the dome around). The Rexton would cause feedback a fair amount of the time.

Note: For the Rexton they tried molded ear plugs - those were awful. It was like listening inside an echoey cave.

Comfort: The Phillips are wonderful. I don't even feel them in my ear and at the end of they day, barely any pain from being there all day. The Audible & Rexton hurt as the day goes on. Sometimes I have to take them out after 6+ hours.

Direct connection to iPhone: The Phillips sucks. At first I couldn't even hear and they had to call Phillips tech support to learn how to increase the phone volume. It's still pretty quiet and sounds real tinny. Like a speaker in an old time drive in movie. The Audible & Rexton are both really good on this.

Power up/down sound: Audible & Phillips are both good. Very distinctive sound, good volume. The Rexton is awful, so quiet almost can't hear it.

iPhone app: Audible & Phillips have good apps. The Rexton app is embarrassingly bad.

October 20, 2022 | Permalink | Comments (0)

Reblog (0) | | Pin It! |

What Every Developer Should Know About Time

Introduction

How to handle time is one of those tricky issues where it is all too easy to get it wrong. So let's dive in.

First off, using UTC (also known as Greenwich Mean Time) is many times not the correct solution. Yet many programmers think that if they store everything that way, then they have it covered. (This mistake is why several years ago, when Congress changed the start of DST in the U.S., you had to run a hotfix on Outlook for it to adjust reoccurring events.)

What is Time

So let's start with the key question – what do we mean by time? When users says they want something to run at 7:00 am, what do they mean? In most cases, they mean 7:00 am where they are located – but not always. In some cases, to accurately compare say web server statistics, they want each "day" to end at the same time, unadjusted for DST. At the other end, someone who takes medicine at certain times of the day and has that set in their calendar, will want that to always be on local time so a 3:00pm event is not 3:00am when they have travelled half way around the world.

So we have three main use cases here (there are some others, but they can generally be handled by the following):

  1. The same absolute (for lack of a better word) time.
  2. The time in a given time zone, shifting when DST goes on/off (including double DST which occurs in some regions).
  3. The local time.

The first is trivial to handle – you set it as UTC. By doing this every day of the year will have 24 hours. (Interesting note, UTC only matches the time in Greenwich during standard time. When it is DST there, Greenwich and UTC are not identical.)

The second requires storing a time and a time zone. However, the time zone is the geographical zone, not the present offset (offset is the difference with UTC). In other words, you store "Mountain Time," not "Mountain Standard Time" or "Mountain Daylight Savings Time." So 7:00 am in "Mountain Time" will be 7:00 am in Colorado regardless of the time of year.

The third is similar to the second in that it has a time zone called "Local Time." However, it requires knowing what time zone it is in in order to determine when it occurs.

Putting it to Use

Ok, so how do you handle this? It's actually pretty simple. Every time needs to be stored one of two ways:

  1. As UTC. Generally when stored as UTC, you will still set/display it in local time.
  2. As a datetime plus a geographical timezone (which can be "local time").

Now the trick is knowing which to use. Here are some general rules. You will need to figure this out for additional use cases, but most do fall into these categories.

  1. When something happened – UTC. This is a singular event and regardless of how the user wants it displayed, when it occurred is unchangeable.
  2. When the user selects a timezone of UTC – UTC.
  3. An event in the future where the user wants it to occur in a timezone – datetime plus a timezone. Now it might be safe to use UTC if it will occur in the next several months (changing timezones generally have that much warning - although sometimes it's just 8 days), but at some point out you need to do this, so you should do it for all cases. In this case, you display what you stored.
  4. For a scheduled event, when it will next happen – UTC. This is a performance requirement where you want to be able to get all "next events" where their runtime is before now. Much faster to search against dates than recalculate each one. However, this does need to recalculate all scheduled events regularly in case the rules have changed for an event that runs every quarter.
  5. For events that are on "local time", the recalculation should occur anytime the user's timezone changes. And if an event is skipped in the change, it needs to occur immediately.

Browser Pain

The one thing we have not figured out is how to know a user's location if they are using a browser to hit our web application. For most countries, the locale can be used to determine the timezone – but not for the U.S. (6 zones), Canada, or Russia (11 zones). So you have to ask users to set their timezone – and to change it when they travel. If anyone knows of a solution to this, please let me know.

Update: I received the following from Justin Bonnar (thank you):

HTML

Copy Code

<input id="timezone_offset" type="hidden" name="timezone_offset" value="">
<script type="text/javascript">
document.getElementById('timezone_offset').value = new Date().getTimezoneOffset();
</script>

Using that plus the suggestion of the geo location for the IP address mentioned below will get you close. But it's not 100%. The time offset does not tell you if you for example if you are in Arizona (they & Hawaii do not observe daylight savings time) vs Pacific/Mountain (depending on DST) time zone. You also depend on JavaScript being on although that is true for 99% of the users out there today.

The geo location based on IP address is also iffy. I was at a hotel in D.C. when I got a report of our demo download form having a problem. We pre-populate the form with city, state, & country based on the geo of the IP address. It said I was in Cleveland, OH. So again, usually right but not always.

My take is we can use the offset, and for cases where there are multiple timezones with that offset (on that given day), follow up with the geo of the IP address. But I sure wish the powers that be would add a tz= to the header info sent with an HTML request.

Resources

  • Original article (with screenshots of how Outlook handles this well)

February 18, 2011 | Permalink | Comments (0)

Reblog (0) | | Pin It! |

Everything You Need to Know about Locales

Introduction

A long time ago, when I was a senior developer in the Windows group at Microsoft, I was sent to the Far East to help get the F.E. version of Windows 3.1 shipped. That was my introduction to localizing software - basically being pushed in to the deep end of the pool and told to learn how to swim. This is where I learned that localization is a lot more than translation.

Note: One interesting thing we hit - the infamous Blue Screen of Death switched the screen into text mode. You can't display Asian languages in text mode. So we (and by we, I mean me) came up with a system where we put the screen in VGA mode, stored the 12 pt. Courier bitmap at the resolution for just the characters used in BSoD messages, and rendered it that way. You kids today have it so easy J.

Fast forward to today where getting the locale right is essential on every file we read or write - if the locale is wrong, you write 12.345 instead of 12,345.

The Locale

OK, so forward to today. What is a locale and what do you need to know? A locale is fundamentally the language and country a program is running under. (There can also be a variant added to the country, but use of this is extremely rare.) The locale is this combination but you can have any combination of these two parts. For example, a Spanish national in Germany would set es_DE so that their user interface is in Spanish (es) but their country settings are in German (DE). Do not assume location based on language or vice-versa.

The language part of the locale is very simple - that's what language you want to display the text in your app in. If the user is a Spanish speaker, you want to display all text in Spanish. But what dialect of Spanish - it is quite different between Spain and Mexico (just as in America we spell color while in England it's colour). So the country can impact the language used, depending on the combination.

All languages that support locale specific resources (which is pretty much all of them today) use a fall-back system. They will first look for a resource for the language_country combination. While es_DE has probably never been done, there often is an es_MX and es_ES. So for a locale set to es_MX, it will first look for the es_MX resource. If that is not found, it then looks for the es resource. This is the resource for that language, but not specific to any country. Generally, this is copied from the largest country (economically) for that language. If that is not found, it then goes to the "general" resource which is almost always the native language the program was written in.

The theory behind this fallback is you only have to define different resources for the more specific resources - and that is very useful. But even more importantly, when new parts of the UI are made and you want to ship beta copies or you release before you can get everything translated, well then the translated parts are in localized but the non-translated parts still display - but in English. This annoys the snot out of users in other countries, but it does get them the program sooner. (Note: We use Sisulizer for translating our resources - good product.)

The second half is the country. This is used primarily for number and date/time settings. This spans the gamut from what the decimal and thousand separator symbols are (12,345.67 in the U.S. is 12 345,67 in Russia) to what calendar is in use. The way to handle this is by using the run-time classes available for all operations on these elements when interacting with a user. Classes exist for both parsing user entered values as well as displaying them.

Keep a clear distinction between values the user enters or are displayed to the user and values stored internally as data. A number is a string in an XML file, but in the XML file, it will be "12345.67" (unless someone did something very stupid). Keep your data strongly typed, and only do the locale specific conversions when displaying or parsing text to/from the user. Storing data in a locale specific format will bite you in the ass sooner or later.

Chinese

Chinese does not have an alphabet but instead has a set of glyphs. The People's Republic of China several decades ago significantly revised how to draw the glyphs and this is called simplified. The Chinese glyphs used elsewhere continued with the original and that is called traditional. It is the exact same set of characters, but they are drawn differently. It is akin to our having both a text A and a script A - they both mean the same thing but are drawn quite differently.

This is more of a font issue than a translation issue, except that wording and usage has diverged a bit, in part due to the differences in approach between traditional and simplified Chinese. The end result is that you generally do want to have two Chinese language resources, one zh_CN (PRC) and one zh_TW (Taiwan). As to which should be the zh resource - that is a major geopolitical question and you're on your own (but keep in mind, PRC has nukes - and you don't).

Strings with substituted values

So you need to display the message Display ("The operation had the error: " + msg);. No, no, no! Because in another language the proper usage could be Display("The error: " + msg + " was caused by the operation");. Every modern run-time library has a construct where you can have a string resource "The operation had the error: {0}" and will then substitute in your msg at {0}. (Some use a syntax other than {0}, {1}, ...)

You store these strings in a resource file that can be localized. Then when you need to display the message, you load it from the resources, substitute in the variables, and display it. The combination of this, plus the number and date/time formatters make it easy to build up these strings. And once you get used to them, you'll find it easier than the old approach. (If you are using Visual Studio - download and install ResourceRefactoringTool to make this trivial.)

Arabic, Hebrew, and complex scripts

Arabic and Hebrew are called b-directional because parts of it are right to left while other parts are left to right. The text in Arabic/Hebrew are written and read right to left. But when you get to Latin text or numbers, you then jump to the left-most part and read that left to right, then jump back to where that started and read right to left again. And then there is punctuation and other non-letter characters where the rules depend on where they are used.

Here's the bottom line - it is incredibly complex and there is no way you are going to learn how it works unless you take this on as a full-time job. But not to worry, again the run-time libraries for most languages have classes to handle this. The key to this is the text for a line is stored in the order you read the characters. So in the computer memory, it is in left to right order for the order you would read (not display) the characters. In this way, everything works normally except when you display the text and determine moving the caret.

Complex scripts like Indic scripts have a different problem. While they are read left to right, you can have cases where some combinations of letters are placed one above the other, so the string is no wider on the screen when the second letter is added. This tends to require a bit of care with caret movement but nothing more.

We even have cases like this in English where ae is sometimes rendered as a single æ character. (When the human race invented languages, they were not thinking computer friendly.)

Don't over-stress it

It seems like a lot but it's actually quite simple. In most cases, you need to display text based on the closest resource you have. And you use the number and date/time classes for all locales, including your native one. No matter where you live, most computer users are in another country speaking another language - so localizing well significantly increases your potential market.

And if you're a small company, consider offering a free copy for people who translate your product. When I created Page 2 Stage, I offered a free copy (list price $79.95) for translating it - and got 28 translations. I also met some very nice people online in the process. For an enterprise level product, many times a VAR in another country will translate it for you at a reduced rate or even free if they see a good market potential. But in these cases, do the first translation in-house to get the kinks worked out.

One resource I find very useful is the Microsoft Language Portal where you can put in text in English and if that text is in any of the Microsoft products, it will give you the translation Microsoft used for a given language. This can give you a fast high-quality translation for up to 80% of your program in many cases.

Удачи! (Good luck!)

Resources

  • Original article (with screenshots): Everything you need to know about Locales
  • Microsoft Language Portal
  • Resource Refactoring Tool

January 23, 2011 | Permalink | Comments (0)

Reblog (0) | | Pin It! |

What Every Developer Should Know About Fonts

If you're going to do much more than place a line of text in a form, then the details start to matter.

Introduction

I originally thought using fonts would be pretty simple. If you're going to do much more than place a line of text in a form, then the details start to matter.

Fonts and Glyphs

So what is a font? Fundamentally, a font is a series of glyphs. What you think of as a character like the letter A is a glyph. A font is then a set of glyphs for all the letters in that font. If you get the Helvetica font, all their glyphs look one way. If you get the Times Roman font, they look another. Each is the set of glyphs from that font.

Now we need to introduce the concept of code pages. A code page is a mapping from a character number to a specific glyph. Programs originally stored each character as a byte. Then, for Asian character sets, there were the DBCS systems (some characters were 1 byte, some 2). Programs today mostly use Unicode, but web pages tend to be UTF-8, which is a multi-byte sequence that can be up to 4 bytes.

Why bring up encoding? Because each font has an encoding where character number 178 could return a very different glyph depending on the codepage used by the font. Most font files use Unicode so you have a standard there, but many programs still use specific code pages, where that page is mapped to the font. This is what occurs when you display ABC and the font is Wingdings, you get . So point one is, you need to make sure that the encoding you use matches or is mapped to the encoding of the fonts you use.

And it gets even more complex. The characters with the values 0xE000 - 0xF8FF are undefined. Each font can make those anything they want (one use is to add the Klingon script). So a character with a value in this range is by definition tied to the font file it is using to display that font. This is how most symbol type fonts work.

OK, so you are using Unicode, your font file uses Unicode, you pass it a string and... the string displays blank. What's going on? Well, there's no requirement that a font file has a glyph for any given character. A Symbol font won't have ABC. Most fonts used in Europe and America don't have the Chinese, Japanese, or Korean glyphs. It's not an error to use a glyph that a font does not have, but it will display nothing, not blank, but nothing (i.e., 0 points wide).

You can also hit a similar problem if you are using one of the old code pages if you want to display a glyph that does not exist in the code page. In that case, you need to map in a different code page, at least for that character (this is how Word used to handle this case).

Font Families

Fonts fall into several different classes. First, there is proportional vs. monospaced fonts. In a monospaced font, all characters are the exact same width. And the height is consistent in that all lower case letters are the same height as are all upper case. Avoid monospaced fonts as much as possible because they are much harder to read. Asian fonts are almost all monospaced because the Chinese Han characters all have identical widths and heights, so proportional would make no sense. On the flip side, Hebrew and Arabic pretty much have to be proportional.

Next is the typeface which can be serif where you get stuff at the end of their stroke, sans serif where you do not get anything extra at the end, decorative where it is way beyond normal, and Symbol that can have anything random, including barcodes that match the ASCII numbers of the character codes mapped to the glyphs. And this is just the Western European alphabets.

Fontmetrics

Now we get in to measuring fonts, and by fonts, most (not all) of it is measuring glyphs. The standard measurement used for fonts is the point, and while there's a lot of history to what a point originally meant, for the computer world, it has been 72 points == 1 inch. You will also sometimes see twip, which stands for twentieth of a Point, so 1440 twips == 1 inch. And we now have EMU, where 914400 EMUs == 1 inch (more here). If you work with points, you need to use floating point variables. Twips are generally OK as an integer, and EMUs definitely are.

Then comes the font point size. This is a completely arbitrary number. Think of it like the diagonal size of the old CRT monitors where the actual size was close to what you expected, but was never that number. The point size determines the size of the rendered glyphs, but it has no specific measurement on the page.

Now, here's where it starts to get interesting, the fontmetrics. First, everything must be measured from the baseline. Working from any other part of the font won't work - you will hit major problems. So start there. The highest drawn part above the baseline is the ascent, and the lowest drawn part below the baseline is the descent, both measured from the baseline.

Then there is the spacing between two lines of text. This is a font setting as the font designer determines what is the appropriate spacing for that font. This can be returned different ways, Windows considers this the spacing you put above the next line returning a measure from baseline to baseline, while Java views it as the spacing below a line before the next line and returns just this value. Leading is the spacing you place between lines of similar single spaced text. If the spacing is greater than single spacing, then you add to this value.

You generally want to get these heights for the fonts, not for the string of glyphs in the string you display. Why? Because what if a line is "we were wrox" - with no ascenders or descenders, the line would be placed closer to the other lines in the paragraph, and that would look weird. You also need to look at all fonts and point sizes because if some text is larger, you must use large ascent/descent/leading values. But only for the line(s) that have the larger text, not for the entire paragraph. And again, all of this is measured from the baseline, which is the only way to handle mixed fonts/sizes.

OK, height takes a bit of work, but it's pretty straightforward, but the width - this gets really interesting. And by interesting, I mean you have to get everything just right. Fundamentally, except for fixed width fonts, adding up the width of each glyph will not equal the width of all those glyphs rendered together. Pretty much never. Why? A couple of reasons:

  • Kerning is where letters are placed based on the letter that adjoins. That is why AB stays distinct while tt overlaps quite a bit.
  • Some character combinations in Latin alphabets are combined, such as ae becoming æ, and in German, ss becoming ß.
  • Hebrew and Arabic glyphs are different for the same character depending on if it is at the start, middle, or end of a word. And in the case of Arabic especially, the glyphs used on the ends tend to be wider than the glyphs in the middle. So the width of ﺺ is dependent of where it is in the string.
  • Bi-directional fonts have an additional issue listed below.
  • Complex scripts, like Indic (India) will change the glyph at a location, building it up from several characters. So a three character string can be anything from 1 to 3 glyphs wide.

Very simply, you need to feed a complete, fully formatted string to the fontmetrics API provided by the platform you are running on to get the length of the string. It's an expensive call because the string will be rendered to memory to determine the length, but there is no alternative that will be accurate. And you must use the exact same settings when measuring as you do when rendering. Anytime these have not matched, we have found differences large enough that the human eye can make them out. The best way to test your code for this is to look at right aligned text, because you generally have to get the baseline position of the left end of the string when rendering, so if you calculate the length wrong - it will show.

Bi-directional Text

Finally, we have the issue of bi-directional text (Arabic and Hebrew). Bi-directional text goes right to left, except numbers, and Latin words go left to right. So it is read right to left, then on a number or sequence of Latin text, you jump over to the left most point, read left to right back to where you completed the previous Hebrew/Arabic, then jump to the start of the Latin/number part, and go back to right to left.

There has been a ton of research performed on when these switches should take place. There are characters that have a strong direction, characters that have a weak direction, and characters that have no directional preference. You have no prayer for correctly implementing these rules. None. But all is not lost. Pretty much every platform, including Java and Windows, has an API where you provide the string of characters in the order read, and it will render them correctly according to the rules. They also have an API for telling you where each character is located and which character you should move to if you want to move the caret 1 character forward or backwards.

You can use this API for all font rendering and caret movement regardless of text, and it will work fine - on complex scripts too. It's a bit of a pain to start with this if you are not targeting bi-di or complex scripts, but if you're going to be there eventually, it's best to start off using it so you don't have to re-architect your code. Trust me, you really, really don't want to have to re-architect (I had to once - OW!).

Warning

Do not copy Windows fonts to Linux or other Operating Systems. The fontmetrics tend to be off and the text will look off. I don't know why, as TrueType is supposed to be portable, but in practice, just like Java is write once debug everywhere, fonts tend to be design once tweak everywhere. Get fonts from a vendor who has optimized them for your platform.

Resources

  • What every developer should know about fonts (with screenshots)
  • Wikipedia - Typefaces
  • Norman Walsh's Font FAQ
  • If you need some Lorem Ipsum

January 07, 2011 | Permalink | Comments (0)

Reblog (0) | | Pin It! |

What Every Developer Should Know About Bitmaps

Introduction

Virtually every developer will use bitmaps at times in their programming. Or if not in their programming, then in a website, blog, or family photos. Yet many of us don't know the trade-offs between a GIF, JPEG, or PNG file - and there are some major differences there. This is a short post on the basics, which will be sufficient for most, and a good start for the rest. Most of this I learned as a game developer (inc. Enemy Nations) where you do need a deep understanding of graphics.

Bitmaps

Bitmaps fundamentally store the color of each pixel. But there are three key components to this:

  1. Storing the color value itself. Most of us are familiar with RGB, where it stores the Red, Green, and Blue component of each color. This is actually the least effective method as the human eye can see subtle differences on some parts of the color spectrum more than others. It's also inefficient for many common operations on a color, such as brightening. But it is the simplest for the most common programming tasks, and so has become the standard.
  2. The transparency of each pixel. This is critical for the edge of non-rectangular images. A diagonal line, to render best, will be a combination of the color from the line and the color of the underlying pixel. Each pixel needs to have its level of transparency (or actually opacity) set from 0% (show the underlying pixel) to 100% (show just the pixel from the image).
  3. The bitmap metadata. This is information about the image, which can range from color tables and resolution to the owner of the image.

Compression

Bitmaps take a lot of data. Or to be more exact, they can take up a lot of bytes. Compression has been the main driver of new bitmap formats over the years. Compression comes in three flavors: palette reduction, lossy, and lossless.

In the early days, palette reduction was the most common approach. Some programs used bitmaps that were black and white, so 1 bit per pixel. Now that's squeezing it out. And into the days of Windows 3.1, 16 color images (4 bits/pixel) were still in widespread use. But the major use was the case of 8-bits/256 colors for a bitmap. These 256 colors would map to a palette that was part of the bitmap, and that palette held a 24-bit color for each entry. This let a program select the 256 colors out of the full spectrum that best displayed the picture.

This approach was pretty good, and mostly failed for flat surfaces that had a very slow transition across the surface. It also hit a major problem early on with the web and windowed Operating Systems - because the video cards were also 8-bit systems with a single palette for the entire screen. That was fine for a game that owned the entire screen, but not for when images from different sources shared the screen. The solution to this is, a standard web palette was created and most browsers, etc., used that palette if there was palette contention.

Finally, there were some intermediate solutions such as 16-bits/pixel which did provide the entire spectrum, but with a coarse level of granularity where the human eye could see jumps in shade changes. This found little usage because memory prices dropped and video cards jumped quickly from 8-bit to 24-bit in a year.

Next is lossy compression. Compression is finding patterns that repeat in a file, and then in the second case, just point back to the first run. What if you have a run of 20 pixels where the only difference in the second run is two of the pixels are redder by a value of 1? The human eye can't see that difference. So you change the second run to match the first, and voila, you can compress it. Most lossy compression schemes let you set the level of lossiness.

This approach does have one serious problem when you use a single color to designate transparency. If that color is shifted by a single bit, it is no longer transparent. This is why lossy formats were used almost exclusively for pictures, and never in games.

Finally comes lossless. This is where the program compresses the snot out of the image with no loss of information. I'm not going to dive into the what/how of this, except to bring up the point that compressing images takes substantially more time than decompressing them. So displaying compressed images - fast. Compressing images - not so fast. This can lead to situations where, for performance reasons, you do not want to store in a lossless format on the fly.

Transparency

Transparency comes in three flavors. (If you know artists who create web content - have them read this section. It's amazing the number who are clueless on this issue.) The first flavor is none - the bitmap is a rectangle and will obscure every pixel below it.

The second is a bitmap where a designated color value (most use Magenta, but it can be any color) means transparent. So other colors are drawn and the magenta pixels are not drawn so the underlying pixel is displayed. This requires rendering the image on a selected background color, and the edge pixels that should be partially the image and partially the background pixel then are partially the background color. You see this in practice with 256 color icons, where they have perfect edges on a white background, yet have a weird white halo effect on their edges on a black background.

The third flavor is 8 bits of transparency (i.e., 256 values from 0 - 100%) for each pixel. This is what is meant by a 32-bit bitmap; it is 24-bits of color and 8 bits of transparency. This provides an image that has finer graduations than the human eye can discern. One word of warning when talking to artists - they can all produce "32-bit bitmaps". But 95% of them produce ones where every pixel is set to 100% opacity and are clueless about the entire process and the need for transparency. (Game artists are a notable exception - they have been doing this forever.) For a good example of how to do this right, take a look at Icon Experience - I think their bitmaps are superb.

Resolution

Many formats have a resolution, normally described as DPI (Dots Per Inch). When viewing a photograph, this generally is not an issue. But take the example of a chart rendered as a bitmap. You want the text in the chart to be readable, and you may want it to print cleanly on a 600 DPI printer, but on the screen, you want the 600 dots that take up an inch to display using just 96 pixels. The resolution provides this ability. The DPI does not exist in some formats, and is optional in others (note: it is not required in any format, but it is unusual for it to be missing in PNG).

The important issue of DPI is that when rendering a bitmap, the user may want the ability to zoom in on and/or to print at the printer's resolution but display at a lower resolution - you need to provide the ability for the calling program to set the DPI. There's a very powerful charting program that is useless except for standard viewing on a monitor - because it renders at 96 DPI and that's it. Don't limit your uses.

File formats

OK, so what file formats should you use? Let's go from the most to least useful.

  • PNG - 32-bit (or less), lossless compression, small file sizes - what's not to like. Older versions of some browsers (like Internet Explorer) would display the transparent pixels with an off-white color, but the newer versions handle it properly. Use this (in 32-bit mode using 8 bits for transparency) for everything.
  • ICO - This is the icon file used to represent applications on the desktop, etc. It is a collection of bitmaps which can each be of any resolution and bit depth. For these, build using just 32-bit PNG files from 16x16 up to 256x256. If your OS or an application needs a lesser bit depth, it will reduce on the fly - and keep the 8 bits of transparency.
  • JPEG - 24-bit only (i.e., no transparency), lossy (can be lossless, but compression drops a lot), small file sizes. There is no reason to use this format unless you have significant numbers of people using old browsers. It's not a bad format, but it is inferior to PNG, with no advantages.
  • GIF - 8-bit, lossless (but 8-bit generally means degradation), very small file sizes. GIF has two unique features. First, you can place multiple GIF bitmaps in a single file with a delay set between each. It will then play through those, giving you an animated bitmap. This works on every browser back to the 0.9 versions, and it's a smaller file size than a Flash file. On the flip side, it is only 8 bits, and in today's world, that tends to look poor (although some artists can do amazing things with just 8 bits). It also has a set color as transparent, so it natively supports transparency (of the on/off variety). This is useful if you want animated bitmaps without the overhead of Flash or if bandwidth is a major issue.
  • BMP (also called DIB) - from 1 up to 32-bit lossless (but less than 24-bit generally means degradation), large file sizes. There is one case to use this - when speed is the paramount issue. Many 2-D game programs, especially before the graphics cards available today, would store all bitmaps as a BMP/DIB because no decompression was required and time saving is critical when you are trying to display 60 frames/second for a game.
  • TIFF - 32-bit (or less), lossless compression, small file sizes - and no better than PNG. Basically, the government and some large companies decided they needed a "standard" so that software in the future could still read these old files. This whole argument makes no sense as PNG fits the bill. But for some customers (like the federal government), it's TIFF instead of PNG. Use this when the customer requests it (but otherwise use PNG).
  • Everything else - Obsolete. If you are creating a bitmap editor, then by all means support reading/writing every format around. But for other uses - stick to the 2+4 formats above.

An alternative view

The remarks below have led to a really good discussion about the different bitmap formats. Definitely an example of the wisdom of crowds. To sum up the view from below:

  1. Always use the format that is appropriate for your use (I agree).
  2. Even in a world of high bandwidth, high volume sites will find the reduced size of JPEG (over PNG) of great value.
  3. JPEG is it for the world of photography and other uses of real life pictures (YouTube embedded shots), and is likely to remain so for the indefinite future.
  4. TIFF is of great value for some specific use cases, including compressed black and white for document images. One person also disagrees as to why TIFF is obsolete (but does agree it is).
  5. GIF can beat the others, but in very few cases. (I think most agree with me, GIF is on its way out.)

I take away two things from the comments. First, JPEG may be here to stay in widespread use along with PNG. It is continuing to evolve, which is a good sign for its future. Second, we'll probably always have some people using every bitmap format ever invented.

And yes, this article is a start on bitmaps, not an encyclopedia. But it is not "everything you should know"; it is "what every developer should know" - in other words, it's base info all should know. With that said, I think it's a really good base for someone who is first programming around bitmaps.

December 23, 2010 | Permalink | Comments (0)

Reblog (0) | | Pin It! |

What Every Developer Should Know About Character Encoding

Introduction

If you write code that touches a text file, you probably need this.

Let's start off with two key items:

  1. Unicode does not solve this issue for us (yet).
  2. Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.

Character Encoding

And let's add a codacil to this – most Americans can get by without having to take this into account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs), and because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.

The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact, we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There, of course, were numerous character sets (or codepages) developed early on. But we ended up with almost everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.

And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.

And for a while, this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.

Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in its header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.

Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.

Now let's look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First, it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.

UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common, there's a first byte which leads to a series of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set), you can write the equivalent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.

But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course, it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.

Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.

Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc. all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Let's take what is actually a very difficult example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.

Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.

Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.

Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong (it also adds the endian preamble to the file.)

Ok, you're reading & writing files correctly, but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.

Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++, this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.

Wrapping It Up

I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.

December 23, 2010 | Permalink | Comments (0)

Reblog (0) | | Pin It! |

XmlZipResolver - an XmlUrlResolver for files in a zip file

moved to Useful XML .net utilities

February 16, 2009 | Permalink | Comments (0) | TrackBack (0)

Tags: jar, XmlUrlResolver, zip

Reblog (0) | | Pin It! |

Opening an XML file that requires a username & password

moved to Useful XML .net utilities

February 16, 2009 | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | | Pin It! |

Get all types a COM object implements

Now at Get all types a COM object implements.

February 16, 2009 | Permalink | Comments (0) | TrackBack (0)

Tags: c#, COM, GetAllTypes, marshal, type

Reblog (0) | | Pin It! |

Great Windows programming resource

The Microsoft support team in China has started a really good code snippet site at http://www.codeplex.com/helloworld Just 3 examples so far but if they run with it this may become step 1 for finding out how to do something in Windows.

February 12, 2009 | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | | Pin It! |

»
My Photo

About

Programmer's Tools

  • My Blog
  • My Toolbox

Recent Posts

  • Comparison of Audible, Phillips, & Rexton Hearing Aids
  • What Every Developer Should Know About Time
  • Everything You Need to Know about Locales
  • What Every Developer Should Know About Fonts
  • What Every Developer Should Know About Bitmaps
  • What Every Developer Should Know About Character Encoding
  • XmlZipResolver - an XmlUrlResolver for files in a zip file
  • Opening an XML file that requires a username & password
  • Get all types a COM object implements
  • Great Windows programming resource

My Blogs

  • Code Snippets
  • Liberal & Loving It
  • My LinkedIn page

Archives

  • October 2022
  • February 2011
  • January 2011
  • December 2010
  • February 2009
  • December 2008
  • May 2008
  • April 2008
  • December 2007
  • June 2007

More...

Subscribe to this blog's feed