The world of human-computer interaction (HCI)bases itself on three cyclic activ- ities when designing interfaces: understanding users, designing, and evaluation. As time has progressed, HCIhas grown to include other types of devices beyond the desktop computers with a keyboard and mouse that are traditionally associ- ated with it. Devices such as portable music players, digital cameras, and mobile phones increasingly require methods and ideas that come from the world of HCI. Methods for determining user needs usually are not dependent on the hardware that will be used. However,many design metaphors that are standard on the desk- top do not transfer to portable devices due to the special requirements of power and input. An open question we try to answer is whether evaluation techniques that were originally developed for desktop platforms could transfer to mobile devices? If they can,what do they bring to mobile evaluation and are there any differences that need to happen to these methods?We examine mobile phones specifically and take a look at several evaluation methods. We examine a method from the GOMSfamily of evaluation,the Keystroke-Level Model. The Keystroke-Level Model gives the time it takes for an expert to do a task error-free. It does this by using operators that represent keystrokes, mouse movement and presses, the movement of hand between the two, and the the time spent mentally preparing for an operator. While these operators are clearly linked to the desktop there are analogs to these operators on mobile phones as well. Creating a model for a task is straight-forward,but it has a potential for being au- tomated. To help in this automation, we have developed a tool called KLM-Qt. KLM-Qt is an open source tool that can examine events that are delivered to an application and convert them into Keystroke-Level Model operators. This has the advantage that all that needs to be done is demonstrate the application to get a model, resulting in a savings of time. We include how the tool works along with the details of its implementation. We also discuss changes that are made to make it work better on mobile phones. Besides describing the tool, we do several evaluations of some mobile phones us- ing different methods. The methods include using KLM-Qt,usability testing,and heuristic evaluation. All with the intent of discovering how well each method works with mobile devices and to see what each approach can provide. We find each method can provide insight into items that could be improved in an inter- face. The KLM-Qt and the Keystroke-Level Model produce useful results but could use some adjustments to help produce more accurate models,particularly when it comes to handling input and calculating when certain operators are needed. Our heuristic evaluation shows that the nature of the results is partially controlled by the use selection of heuristics. Recommendations for the evaluated phones are also reviewed. We conclude with possible improvements that will help make the Keystroke-Level Model method a better fit for mobile devices, and changes for KLM-Qt that will make it a better tool. We also briefly discuss factors to keep in mind when doing evaluation of mobile devices,regardless of the final methods that are used.