Compressing Big Data for Faster ProcessingCategory: Science & Technology
Posted: November 16, 2012 08:12AM
One of the greatest driving forces behind the Internet is the science community as it develops faster and more efficient networks for communicating the massive datasets it creates every day. The problem is that these enormous datasets are very hard to process just because of their size. Researchers at MIT though have developed a way to efficiently compress big datasets with a guaranteed low amount of error that current algorithms can process, instead of requiring newer algorithms designed for larger datasets.
This compression technique has been demonstrated with GPS data, which for one device can add up to 1 GB a day, so when you are looking at the traffic patterns for hundreds or thousands of cars, the amount of information can be overwhelming. However, the only time traffic patterns are interesting is when cars are turning, and everywhere in between can be imagined as a straight line, which is how the compression works. By using regression analysis the researchers simplify a large plot of scattered points to a single line, along with a sample of randomly selected points. These points give an idea of how variable the original data was, but the sample is still much smaller than the raw information.
It is worth noting that the compression algorithm does not arbitrarily select which points are replaced with lines, but chooses those which can be replaced without comprising the data too much. This is actually very important because it allows the researchers to provide a proven guarantee that the error caused by the compression is bounded to a reasonably low value.