In the last decade, the number of Internet-users have increased exponentially, and the amount of service and systems data they create have also increased accordingly. Users are often using different devices which generate semi-structured data and do not follow a common standard. Therefore, large Internet services have an increasing need to store and analyze dynamic semi-structured data, in order to get valuable insights of their user base. This demand has led to the creation of NoSQL data stores used for schemaless data storage. Cloud computing provides computing power and data storage for applications through cloud services platforms. By distributing the workload across a set of servers, the services are able to scale according to demand. With an increasing amount of data, we need an increasing amount of processing resources. We have developed a processing technique called the Hive that to optimize resource usage and is designed to process and analyze semi-structured data. The framework makes use of specialized and efficient machines which processes data in a cost effective manner. There are several big data processing frameworks available, but they do not focus on green computing. This thesis will expand the framework to be a scalable, robust, and persistent processing cluster. In order to create a scalable service, we need to understand the underlying data structure. A quick and flexible key-value store has been implemented to create a persistent and scalable service. Our results indicate that there is a considerable increase in robustness and scalability after implementing persistent storage in the Hive. The Queen was not able to support more than around 12 workers and all data was lost after shutdown. The newly developed Queen is able to handle around 100 workers with ease and withstand shutdowns of nodes in the cluster.