In traditional networks, a dedicated piece of hardware is configured and installed for every single network function. A network function represents any hardware appliance in the network that has network-level functionality, like a router, firewall or load balancer. Network function virtualization (NFV) aims to remove most of these hardware appliances by adding virtualization to the network layer. One of the great advantages of virtualizing network functions includes the ability to run network functions over commodity hardware. Network functions can run inside virtual machines and be part of the same cloud architecture that regular servers migrated to years ago. We also see products from network vendors that aim to specialize the way NFV is applied to networks. They offer hardware middleboxes that are supposed to run virtualized network functions more efficiently than commodity hardware is able to. Even though the concept of NFV has been around for a long time and specialized solutions exist, we don't see NFV being used much in the industry. In this thesis, we seek to find out if the time is right for an NFV-first enterprise strategy. Our approach involves an extensive benchmarking process of the following network types: traditional, specialized NFV and commodity NFV. The benchmarking process we use is this thesis is more extensive than regular benchmarking, as we add the possibility of running benchmarks in serialized manner. We develop a tool for this exact purpose. Results produced by the serialized benchmarking process are organized in matrices that enables the possibility of detecting patterns in data. We use the matrices to compare network characteristics between the NFV environments and the traditional network environment. The most startling discovery we made, based on the results from multiple experiments, is that NFV is highly unpredictable in certain traffic scenarios, which is a characteristic we don't see in similar cases in the traditional network. The patterns of unpredictable behavior in NFV networks are something we noticed because of the breadth we added to the benchmarking process.