Posts tagged source
[Case Study] Lessons in High Performance Computing with Open Source
Feb 2nd
Providing adequate software and tools for researchers has always been of great importance to organizations, but has often come at a great cost. In an era of constantly evolving technology and rapidly dwindling budgets, my IT team has had to work with a large pool of researchers to provide cost-effective solutions that meet the ever-growing demand for innovation and computing power.
I am an Information Technologist for the Department of Statistics and Probability at Michigan State University. The Department is home to award-winning faculty with a wide variety of expertise in fundamental and interdisciplinary research, and over 100 graduate students from all over the world. Keeping the faculty and students ahead of their research is a constantly evolving challenge for my team and I.
Evolution of Statistical Software
For many years, most statistical analysis in our department was done in Matlab, S-Plus, SPSS or SAS. Even with a Higher Educational discount, most of the software required yearly renewal fees that quickly devoured our IT budget. Things started to change when the R language, which was first developed in 1993, began to gain traction in statistics communities in the early 2000s. R is an open source programming language and software environment that is used for statistical computing and data analysis. Several years ago, we began the transition at Michigan State to R; today, it is used for the majority of the research in the department–as well as being a central focus of our statistics curriculum. By switching to the free, open source version of R, our department has been able to cut thousands of dollars each year in software costs and have focused more on fueling and expanding research.
Lesson #1: The Shortcomings of Open Source
As more people began to use R and the analysis became increasingly complex, researchers began to face a large problem: time. Research was taking several months to complete in terms of processing jobs. Often, there is a need to run the calculations several times to ensure accuracy; waiting three months for one to complete was simply not feasible. It was taking R this long to process the jobs because the iterations were computed in serial, one right after another, using only one processor core at a time.
Until the spring of 2010, R was a 32-bit application and could only access a limited amount of memory. The maximum amount of memory that could be accessed by R was only 3GB. When dealing with large datasets researchers were quickly running out of memory as well as discovering they needed a solution to deal with large data efficiently.
Bo Cowgill from Google once said “The best thing about R is that it was developed by statisticians. The worst thing about R …is that it was developed by statisticians.” Even though R was–and still is–constantly evolving, the department needed a solution that could keep up with hardware technology and compute calculations in an efficient, scalable manner.
Lesson #2: Find Commercial Enhancements for Open Source
Our search for a more effective version of R ultimately brought us to a product called Revolution R Enterprise by Revolution Analytics, which provides commercial support and software for open source R. It takes advantage of multiple processor cores by using optimized assembly code and efficient multi-threaded algorithms that use all of the processor cores simultaneously. Although this addressed a lot of the issues of open source R, professors were only using Revolution R on their desktops. The next question was, how we could combine the power of our servers to dramatically decrease our computation times?
Lesson #3: Expanding to Infinity and Beyond
Open Source R is a memory-bound language. This means that all of the data, matrices, lists etc. need to be stored in memory. Issues quickly arose when data sets became several gigabytes large and were too big to fit into memory. This required implementing parallel external memory algorithms and data structures to handle the data. These challenges were tackled by Revolution Analytics as they developed the R language for a High Performance Computing (HPC) environment.
In 2010, Revolution Analytics offered Revolution R Enterprise free for academic users and shifted the focus of their enterprise software to big data, large scale multiprocessor computing and multi-core functionality. Revolution Analytics was going to tackle everything the department needed. The evolution was complete: open source R went from an inefficient single core program to a HPC environment.
Once the department could schedule R jobs in an HPC environment, the demand began to drastically increase. The HPC cluster is now scheduling more than four times the amount of jobs that were scheduled in previous semesters, from 200 jobs over a year ago to over 800 jobs this past semester. Jobs that were taking over three months to complete on open source R were completed in less than a few days with Revolution R. Computational jobs are now run multiple times with significantly higher levels of accuracy than ever before.
Conclusion
There are often great pieces of software created through open source, but they generally lack key features needed for an enterprise environment. Combined with commercial backing and expertise, these projects can be further developed and expanded to meet the needs of large-scale enterprise environments. IT departments can provide enhanced solutions to their users that adapt to the expanding world of cloud and High Performance Computing environments–all while minimizing the impact on a shrinking budget.
Photo courtesy of Shutterstock.
View full post on ReadWriteWeb
[Poll] Does An Open Source webOS Have A Legitimate Future?
Jan 27th
This week, Hewlett-Packard announced the open source roadmap for webOS along with the next edition of its application framework, Enyo 2.0. As we wrote yesterday, the time for webOS to shine may lie ahead. What it comes down to is how well the open source community responds to webOS and whether or not the original equipment manufacturers will ever decide to build webOS devices.
The favorable response of the community and OEMs is not guaranteed. Many think webOS is as dead an operating system as Aramaic is a language. That may include former Palm CEO Jon Rubinstein who is leaving HP after his commitment to the company elapsed. Is there still potential for webOS and Enyo or have we seen the last of the once-promising mobile operating system? That is the topic of this week’s ReadWriteMobile poll.
There may or may not be a future for webOS. The timeline stretches to September this year and is licensed under the Apache 2.0 open source license. HP has said that developers are free to suggest new aspects of the project and bounce them off the experts in the in the Enyo Forum. The company believes it is more likely that proposals concerning the outer branches of webOS will be undertaken than anything touching the core of the source code and kernel.
The biggest gain that open sourcing webOS may garner could have less to do with webOS itself than with Enyo. The application framework is fundamentally Web-based. In mobile terms that means it will rely heavily on HTML5 and CSS and work through WebKit and Direct Canvas. While there are other HTML5 frameworks developers can use to create mobile Web apps, such as those provided by appMobi and Sencha Touch, one of the biggest desires of mobile HTML5 developers has been a consistent, easy-to-use framework. Enyo might be the option that developers have been looking for.
For the OEMs, there may be an advantage in contributing to the webOS open source project. These are turbulent days for many OEMs. HTC was one of the companies that helped make Android popular, but it has seen its growth stall with the dominance of Samsung in the ecosystem. Motorola, which reported a loss for the 2011, is stuck to Android through its potential acquisition by Google. Samsung has shown a willingness to adopt any mobile platform that it thinks it can create future growth. Secondary OEMs such as LG and Huawei could hedge bets against a reliance on Android with webOS.
Will anybody adopt it? Or are the dissembled parts of webOS, like the standard Linux kernel or the application ecosystem that could be created through Enyo, more valuable? Take the poll below and let us know your thoughts in the comments.
View full post on ReadWriteWeb
[UPDATED] Source: Next Xbox Won’t Play Used Games
Jan 26th
An unnamed source is telling video game news site Kotaku that the next version of Microsoft’s Xbox will not play used games.
The person, identified as a “reliable industry source” also told Kotaku that Xbox 720 will be able to play Blu-Ray discs, an option not offered on current versions of the Xbox. The next generation of Xbox is expected to be released later this year or early in 2013.
We’ve asked Microsoft for confirmation and comment. We’ll update if they get back to us. So far most speculation about the new machine is just that, as Microsoft hasn’t even confirmed if the new system will be called Xbox 720, or when it will be available.
Update: “As an innovator we’re always thinking about what is next and how we can push the boundaries of technology like we did with Kinect,” said Microsoft spokesperson Allison Milton. “We believe the key to extending the lifespan of a console is not just about the console hardware, but about the games and entertainment experiences being delivered to consumers. Beyond that we don’t comment on rumors or speculation.”
It was unclear how Microsoft planned to thwart people from playing used games on their consoles. Game publishers have long complained that the used game market erodes their bottom line, but users, who are expected to pay more than $300 for the new units, may bristle and turn to systems on which they can still play second-hand games.
Other rumors tied to Xbox 720 is that it will mark the debut of Kinect 2, Microsoft’s highly-praised hands-free sensor. The newer version of Kinect would contain an on-board processor to better detect user motions, according to Kotaku.
View full post on ReadWriteWeb
Source: Next Xbox Won’t Play Used Games
Jan 25th
An unnamed source is telling video game news site Kotaku that the next version of Microsoft’s Xbox will not play used games.
The person, identified as a “reliable industry source” also told Kobatu that Xbox 720 will be able to play Blu-Ray discs, an option not offered on current versions of the Xbox. The next generation of Xbox is expected to be released later this year or early in 2013.
We’ve asked Microsoft for confirmation and comment. We’ll update if they get back to us. So far most speculation about the new machine is just that, as Microsoft hasn’t even confirmed if the new system will be called Xbox 720, or when it will be available.
It was unclear how Microsoft planned to thwart people from playing used games on their consoles. Game publishers have long complained that the used game market erodes their bottom line, but users – who are expected to pay more than $300 for the new units.
Other rumors tied to Xbox 720 is that it will mark the debut of Kinect 2, Microsoft’s highly-praised hands-free sensor. The newer version of Kinect would contain an on-board processor to better detect user motions, according to Kotaku.
View full post on ReadWriteWeb
GitHub’s Janky Goes Open Source
Dec 20th
With little fanfare, GitHub has released Janky under the MIT license. Janky is a continuous integration (CI) server that runs on top of Jenkins and Hubot, designed to work with projects hosted on GitHub.
Janky, at least as published yesterday by GitHub, is set up to run on top of Heroku. The Heroku app files are stored in a Gist, and can be deployed to Heroku in just a few commands. Naturally, you’ll need a Jenkins install as well.
Once deployed, Janky is controlled with GitHub’s Hubot. It looks like Campfire (the collaboration/chat solution from 37Signals) is required to use Janky at the moment, but if Janky takes off I’d expect to see an IRC option as well.
View full post on ReadWriteWeb
Open Source Challenger to Dropbox and Box.net: ownCloud
Dec 15th
The file sharing, synchronization market led by Dropbox is a popular target these days. For many companies, it’s a chance to horn in on a growing market and carve out a piece of the pie for themselves. For open source projects, it’s a chance to return control of personal data to the user. For the folks behind ownCloud, it’s both.
ownCloud is a project started by Frank Karlitschek, who’s been very active in the KDE project. This week, Karlitschek took ownCloud to the next level with former SUSE/Novell guy Markus Rex and funding from General Catalyst. Terms weren’t disclosed, but sources say that the funding is “well into 7 figures” but below $10 million.
Comparing ownCloud to Apples and Dropboxes
ownCloud is online storage, but it’s not a quick and easy drop in for Dropbox nor is it an exact analog to Box.net or Apple’s iCloud.
First off, ownCloud isn’t just about syncing files. That is to say, it syncs not just files but contacts, calendars and bookmarks across devices. (Yes, those are files too, but it’s doing more than just dropping files into a folder.) ownCloud even features streaming music features.
Secondly, ownCloud lets you choose where your files are going to be hosted. You can use Amazon S3, you can use Google, or you can drop in your own server. For casual users, ownCloud is probably a bit more maintenance than the average user is going to want to deal with. Folks who are particularly privacy conscious, technical or already running their own servers (or using S3, etc.) will probably take to ownCloud, but it’s mostly businesses that will find this feature particularly compelling.
‘
Rex says “we allow the system administrator and company to decide where they want to have their data reside and give complete flexibility around” where it’s stored and how it’s shared. Not only does this mean that you have control, but it also means that users are paying for a service and that the ownCloud business “does not depend on selling gigabytes” says Rex.
Finally, ownCloud is open source. This means that companies have the option of adopting ownCloud without any ties to the company itself, until they need support and/or want to hit up the company for custom development or some other form of support. Companies can also extend ownCloud and participate in development, rather than being locked into a roadmap set by Dropbox or another company.
Not So Fast
If you’re excited by the prospect of ownCloud, you can try out the demo or grab a ownCloud appliance to deploy on a server or Amazon EC2. You can also grab the source and install it on a server with Apache, PHP and MySQL.

However, Rex says that the native clients for Mac OS X, Windows, Android, iOS and so forth are still in development. The actual ownCloud launch is not scheduled until sometime in the first quarter of 2012.
It’s About Time
ownCloud has been in development for quite some time, and has about 350,000 users (estimated). Even though it’s not quite ready for prime time yet, it should be ready to roll early in 2012.
While Dropbox is the easy solution, and Box.net offers a much more advanced service, ownCloud will offer companies a lot more control over their data. It also will provide ISPs and other businesses the opportunity to add ownCloud as a value-add service or standalone offering.
How does ownCloud look to you? Is your business likely to deploy its own file-sharing service?
View full post on ReadWriteWeb
Survey Says: WordPress Leads Open Source CMS Market
Nov 28th
According to water & stone, the “big three” open source CMSes from 2010 continue to dominate in 2011. WordPress, Drupal, and Joomla all topped the company’s survey of open source CMSes, with WordPress “clearly outpacing” Drupal and Joomla.
The survey started with 35 systems, which were narrowed down to 20 after getting the survey responses. The report primarily looked at rate of adoption and brand strength. All we really care about is rate of adoption, so let’s look at that.
For adoption, the survey looked at downloads, installations and third-party support.
WordPress blows the doors off the competition when it comes to downloads. This year WordPress averages more than 640,000 downloads a week – which is actually down from last year by 34%. Joomla is a distant second, with more than 86,000 a week, and Drupal is at nearly 23,000 per week. However, the report notes that Drupal is significantly under-reported due to users who grab Drupal via the git repository, which is not tracked.
Third party analysis shows WordPress far and away more popular, with 53.6% of the Alexa One Million. Joomla has 9.6%, Drupal a mere 6.4%. Another third party, BuiltWith, shows WordPress as having more than 4.2 million installs, Joomla nearly 1.7 million, and Drupal with nearly 308,000.
The survey also looks at third party developer support by comparing Elance and Guru stats. Here developers are advertising their skills, and WordPress tops the list on Elance. It comes in second on Guru, whereas Joomla tops that list. Drupal once again comes in third, and there’s a strong fourth-place showing from DotNetNuke. Note that this may say less about any of the CMSes than it does about the job market – nearly all the major CMSes showed growth.
Another metric? Books in print. WordPress sports a field of 83 books, with 23 released in 2011. Drupal has an impressive 64, with 22 released this year. (Including Drupal User’s Guide by a friend of mine, Emma Jane Hogbin.) Joomla has 65 books, but only 13 released this year.
Though not quite up there with the top three in most metrics, Concrete5 made impressive gains. It came in with 19.3% of the respondents saying they were using Concrete5. WordPress has 34.2%, Drupal 19.8 and Joomla 18.5%. Its downloads are up by more than 500% according to the project’s spokesperson.
The survey responses were mainly from Europe and North America, though the company says that there was participation from 86 countries. More than 50% of respondents identified that their organization was 10 or fewer employees, but more than 20% (21.3% to be exact) were from organizations with more than 100 employees. More than 8% of respondents were from organizations with more than 1,000 employees.
View full post on ReadWriteWeb
Red Hat Veteran Putting Eucalyptus on the Open Source Path
Nov 17th
Eucalyptus was once “the” open source cloud computing project. It was the core of Ubuntu’s cloud strategy, and more or less the only game in town. Unfortunately, it was not a particularly open project. While most of the code was available under an open source license, it wasn’t developed in the open and failed to develop much of a community. Eucalyptus Systems is hoping Greg DeKoenigsberg can fix that.
DeKoenigsberg officially joined Eucalyptus earlier this month, as the vice president of community. DeKoenigsberg has actually been working with Eucalyptus for some time on a consulting basis. He wrote about it in early October briefly, though it he wasn’t yet a full-time employee.
DeKoenigsberg knows a bit about working with open source communities. He worked with Red Hat from 2001 until May of 2010 to work as CTO for Institute for the Study of Knowledge Management in Education (ISKME). It wasn’t long, though, before DeKoenigsberg decided that education wasn’t quite a perfect fit, and he left in July of this year. By phone, DeKoenigsberg said that “education is fascinating, but the drivers are just different. I think there’s a difference between code and content and I wasn’t feeling that what I was trying to do was going to be successful.”
Building a Contributing Community
So DeKoenigsberg is back on familiar turf, but with a daunting task ahead. Eucalyptus competitor OpenStack has displaced Eucalyptus as the default cloud software for Ubuntu, and is sucking most of the oxygen out of the room when it comes to open source cloud software. More than 150 companies have signed up to work on OpenStack, while Eucalyptus is just getting started in trying to build its open source contributor community. DeKoenigsberg’s old company is pursuing its own home-grown software for cloud infrastructure. Eucalyptus isn’t even mentioned on Fedora’s Cloud SIG page.
What will DeKoenigsberg be doing for Eucalyptus? He says that his job is largely one of community building, making sure that the community has what it needs and is on “solid footing.”
Part of that is that Eucalyptus needs to figure out its code contribution model, and that the engineering team is visible and working in the open. DeKoenigsberg says that Eucalyptus is also working to set up roadmaps and get plans out for those who are interested in contributing. The team has been involved in outside discussions, says DeKoenigsberg, like the Fedora lists — but there were no lists for Eucalyptus, and the engineering team tended to “go dark” when working towards a major release. Which, coincidentally, they’re doing now as they work towards Eucalyptus 3.
Eucalyptus now has a community mailing list for development, started in October. Eucalyptus has also fired up the Eucalyptus Education Channel and has a Fast Start project for getting Eucalyptus up and running quickly for developers and users.
A big part of the job, says DeKoenigsberg will be “take as much of the good work that they’ve done internally, and let people see what we’re doing. That alone is the strongest part of our focus. There’s so much going on that people don’t know about, because they haven’t been able to see the workbench. The goal is to make sure picture people have outside is consistent with the picture inside.”
Community Hurdles
Aside from the inertia and mechanics of allowing contributions to Eucalyptus, the project also has a few other hurdles to community involvement. Specifically, the fact that Eucalyptus is not 100% open source and its copyright assignment policy.
The fact that Eucalyptus is “open core” is not only a philosophical problem, it’s also a logistical one. Right now, DeKoenigsberg says that it’s not “crystal clear” what is and what’s not open source. The goal is to make that completely clear, and DeKoenigsberg says the plan for Eucalyptus 3 is to split the handful of proprietary modules out so that it’s easier for developers to work on the core of Eucalyptus.
For example, Windows Guest OS support isn’t in the open source release currently. That’s sort of a major feature to be missing from the open source release and hoping to get buy-in from the larger community. That’s moving to open source in Eucalyptus 3, though. With Eucalyptus 3, some new features and old features will continue to be held back for paying customers, though. For instance, converting VMware images, VMware hypervisor support and SAN adapters for elastic block storage (EBS) and NetApp.
But a bigger problem may be the copyright assignment policy. Copyright assignment tends to be a touchy subject with many developers. DeKoenigsberg says that the copyright assignment may have led to the loss of contributions and “when the time is right, we’ll be sitting down and talking about it in more detail. I haven’t taken a hard position on that, I haven’t had an opportunity to talk to everyone involved. My preference is for a more typical agreement that doesn’t assign copyright.”
The Next Linux?
A fair number of folks from companies like Red Hat and Novell are migrating to jobs with companies like Eucalyptus, Rackspace, or Amazon to work on cloud projects. I asked DeKoenigsberg why that might be. According to DeKoenigsberg, “that’s where the interesting fights are. Here’s the thing, we came to Linux because we wanted the fight. Cloud is the cool new thing, the great free software fight [...] Linux won. So, you know the next big fight is cloud. Keeping the cloud open. It’s just where the opportunities are, you know?”
Building community is not an easy task, and it’s made much more difficult when a company tries to encourage contributors after processes are in place. It’s also difficult around single-vendor projects where one company makes most of the decisions. It will be interesting to see how far Eucalyptus succeeds in getting contributors outside its own walls.
View full post on ReadWriteWeb
Google Starts Pushing the Android Ice Cream Sandwich Source Code
Nov 14th
Google is releasing the source code for Ice Cream Sandwich this evening. In a post over at the Google Building of the company’s Groups forum, John-Baptiste Queru says that the entire Android 4.0 source code will be pushed out through its Git At Google repository and that it will be ready for a full download soon.
This is faster than we thought that Google would push out Ice Cream Sandwich to the entire developer world. Honeycomb never really made it in its full glory to all Android developers. Queru notes that there is documentation for Honeycomb in the ICS release but it does not have a branch in the history tree. Are you ready to get your hands sticky with some ice cream?
The source code should be available at this URL. The push from Google is relatively large and it needs to be fully pushed before it is ready for download. As of 5:46 p.m. EST on Nov. 14, the push was not yet complete. In the comments, Queru notes that the first push is the largest one. Here is the push process according to Queru:
- Push to master, update the master manifest.
- Push to the development branch, create the matching manifest.
- Push to the release branch, tag it, create the machine manifest.
While it is pushing, developers should not go to download it as they will receive an incomplete version of the entire source code. Developers are going to download the source tree but first need to install Repo, a tool that makes it easier to work with Git for Android.
Though Honeycomb is not listed in the branch of the Android history tree, it can be found in the ICS release as notes to the evolution of the Android code. It should be mostly irrelevant at this point though as version 4.0 basically takes everything that Honeycomb did and integrates it into Android with backwards compatibility for apps from different Android flavors and screen sizes.
Ice Cream Sandwich took a little longer for Google to develop than other flavors of Android, mostly because it is a very large update to the platform. It is likely that Google was more prepared to push the ICS source code sooner than later because most of it has been percolating throughout 2011 and waiting for the official release time, which may have been delayed by a variety of factors, including partnerships with OEMs. The source code is actually version 4.0.1, according to Queru, which will be what the Samsung Galaxy Nexus will ship with.
Are you downloading ICS tonight? Let us know your first thoughts of it in the comments.
View full post on ReadWriteWeb
The CIA Open Source Center Tracks the Pulse of the World Through Facebook & Twitter
Nov 4th
The U.S. Central Intelligence Agency has a crack group of analysts tracking the Internet, including tweets and Facebook messages, that takes the pulse of the world. Located in McLean, Virginia the CIA Open Source Center is know as the “vengeful librarians” according to a report from the Associated Press. These librarians are tracking up to five million tweets a day from places like China, Pakistan and Egypt.
It is sometimes disconcerting to know what the U.S. intelligence complex is doing, right in your backyard. McLean is a beltway city in Northern Virginia that is best known for Tysons Corner, one of the shopping hubs of the East Coast. On the outskirts of the city limits there is also the George H.W. Bush CIA complex, on of the agency’s main hubs in the D.C. region.
Open Source Center Set Up After 9/11
The CIA facility was set up after a recommendation from the 9/11 Commission. According to the AP, its first priority was to focus on, “counterterrorism and counterproliferation.” The reports generated from the CIA Open Source Center invariably make their way to president Barack Obama’s desk.
The Green Revolution in Iran in 2009 was when social media like Facebook and Twitter really came to the forefront of the center’s analysts. The analysts correctly predicted the Arab Spring that came to Egypt and Tunisia this year. Essentially, the CIA is using social media to predict where groundswell will turn into real action and follow breaking trends and news.
The U.S. media does much of the same thing, albeit on a much smaller scale. One prominent example was during the Discovery Building hostage crisis in Silver Spring, Maryland in September, 2010. A news startup owned by the owners of Politco, TBD.com, was able to track the tweets and social media happenings around the building and the circumstance, giving a correct and at times chilling view from the area.
(Disclosure: I worked for TBD.com at the time and was in the newsroom during the Discovery Building news. I also lived in McLean, Va.)
Tracking Facebook, Citizens, The World
Facebook has been long accused of having secret ties to the CIA. Some of the more outrageous claims believe that Mark Zuckerberg was recruited by the CIA to build Facebook as a data-mining project and that Facebook hatched as Defense Advanced Research Projects Agency (DARPA) initiative. These types of rumors seem ridiculous, but there is no doubt that governments around the world are using Facebook data to keep an eye on citizens.
The AP report made little mention of what the CIA is doing on the domestic front except for noting that the CIA is using its social media records to compare it to the track record of polling organizations to see how accurate the results are. Think of it as a calibration technique; the polling organizations are often quite accurate and can be used a somewhat of a loose standard to judge the accuracy of the Open Source Center’s results. On the domestic front, the Federal Bureau of Investigation may have its own social media tracking program.
While it may seem underhanded and sneaky for the CIA to be tracking social media use to know the pulse of the world, the practice is not something that should surprise anyone. Much of this data is open to whoever is looking for it and the large companies and data specialists of the private sector likely have similar operations focusing on a variety of aspects of humanity from market trends to politics, sports or fashion.
View full post on ReadWriteWeb