Big Data Archives -

24 Feb, 2014

Gates and Chrysler Were Wrong: Choose the Stingy for a Difficult Job

Big Data: Problem and Solutions

Big Data is a concept more related to data processing than to data size in absolute terms. We know we are dealing with Big Data when traditional ways of processing the data don’t work in a reasonable amount of time. Traditional ways mean the most commonly used computing paradigms in the enterprise world, such as relational databases. In order to process this kind of dataset we take different approaches:

- Use another set of commercial, already built, software packages for Big Data. You could say that these tools are becoming traditional so this definition will have to be redefined in a few years, but also data size is growing so quickly. (Facebook data grew from 20 Petabytes in 2010 to 30 Petabytes in 2011, and continues to increase at a rate of 10TB per day).

- Develop our own specialized software with parallel and distributed computing (like MapReduce)

- Just add more processing power to the current software configuration, and save time until the next problem appears (depends on the budget)

The Budget Variable

The third point of course can be combined with the first two, and it can’t be applied in very complex situations, but it’s no news that a slow database server is often replaced with a faster machine rather than refactoring the entire application. If you have a number of powerful machines, you don’t need to waste time improving small details of your code or database. Well, that’s not true. That is the really lazy way.

The truth is that there are times when there is no option for a new machine. Small companies with low budgets, large companies with tight budgets, there are so many reasons for the situation.

Given limited hardware, a diligent software developer and a database administrator can do wonders in optimization. If the application is large enough, there is always room for optimizing. And the best of all, when new hardware comes out, the application will perform even better, and it will be more scalable.

A large and growing dataset with a bad structure or a badly optimized application, upgraded with more powerful hardware, can be a huge problem in production environments.

Lazy Quote

There are two versions of the quote regarding lazy people and difficult tasks, apparently the first one by Walter Chrysler “Whenever there is a hard job to be done I assign it to a lazy man; he is sure to find an easy way of doing it.”, and the second one by Bill Gates “I will always choose a lazy person to do a difficult job. Because he will find an easy way to do it.” Maybe Bill Gates applied that, and found that the easiest way to say a memorable quote is to copy another one.

I’m sorry they were both wrong. In the Big Data world, I would assign the hardest task to a stingy person who can solve it with the minimum amount of hardware possible.

The Big Data People: Anything but Lazy

In order to apply the second point (and also the first one), you will need diligent developers and diligent systems engineers. I’ve seen myself that the larger the system is, the less information you will find on the Internet for solving the current outage in your systems that is affecting thousands of users.

You will probably face problems that have never happened before, and you will need fast thinkers. And regarding Big Data and large scale systems, there is hardly ever an “easy way” to solve a problem.

Not surprisingly, in a parallel, distributed environment, the technologies that perform best are the most difficult to learn. Strongly typed languages are your choice if your project will contain hundreds of thousands or millions of lines of code. All of this knowledge is not the typical environment of a lazy person.

This doesn’t mean there is no truth in the quote. We all know of computer science enthusiasts who like to make things more complicated than they are. Sometimes they want to use this new cool library or feature, or apply complicated design patterns for a simple task.

But all of this is removed in a low budget environment. When there are tight deadlines and limited resources, hardworking people will give you the best solution.

Eduardo Amo

3064

28 Jan, 2014

Big Data Reality Causes Privacy Concerns

Twice a year, ThoughtWorks publishes the “Technology Radar”—our view on the technology trends that are important in the industry right now, and the trends that will be important in the near future.

It’s a unique perspective from ThoughtWorks and our 2,500 consultants around the world, based on first-hand experiences delivering real software for our clients. Third parties cannot pay to have themselves featured on the Radar and the report is entirely independent in which technologies we include and what we say about them. The latest edition of the Radar was published this week.

One of the large themes we have been tracking over the past couple of years is around Big Data and Analytics. We think the “big” part of Big Data is over-hyped; most of the time you don’t actually need a massive cluster of machines to process your data. But the sheer variety, or “messiness” of all of this data presents new challenges, and there’s a real opportunity to use Advanced Analytics—statistical modeling, machine learning and so on—to gain new insight into your business and into customer behavior. An important trend we note in the Radar is the accessibility of all of these new Analytics techniques. If you do truly have lots of data you can simply go rent a portion of the cloud to process it, with SaaS offerings from Amazon, Google, Rackspace and others. If you want to analyze your data you can do it with point-and-click tools or open-source offerings such as the amazing D3.js JavaScript library.^{^[1]} Open-source is a huge democratizing factor here—you no longer need to pay for an expensive “big iron” solution for data processing and analysis.

We’re excited about the increased awareness around data because software systems can use data and analytics to provide significantly better end-user experiences, as well as delivering increased value to businesses. As has already happened with unit-testing, we expect it to become every developer’s job to understand the importance of data and what can be done with it. That’s not to say every developer needs a statistics degree or a PhD, but we’re expecting data engineering and analysis to become a bread-and-butter part of a developer’s job rather than some weird thing “those data science people” do in a corner.

While there’s much to be gained from better retention, analysis and understanding of data, it comes with a darker side. Companies employing advanced analytics have quickly realized that they need to avoid being too accurate with their insights or people feel unnerved, even violated. One way to avoid spooking people is to deliberately include less-relevant offerings and advertisements to a customer, so they don’t feel targeted. The strategy is to get right up to the “spookiness” line but not to cross it.

As we’ve seen over the past few months, any digital trail can potentially be considered an indelible record. Responsible organizations need to look at these revelations, as well as the weekly news of private-sector security breaches, and consider their response. In Europe, many companies are adopting a strategy of Datensparsamkeit^{^[2]}, a term that roughly translates as “data austerity” or “data parsimony.” The method originates in Germany where data privacy laws are significantly stricter than in the US. Rather than taking an approach of storing and logging every possible scrap of information about a customer and their interactions, Datensparsamkeit advocates only storing the data you absolutely need in order to provide your service to that customer. This way their privacy is maintained even in the unfortunate event of a data breach.

Society is increasingly driven by technology, and changing at an ever increasing pace. As technologists it’s our responsibility not just to consider what we can do with our new tools, but whether it’s the right thing to do. Ethics are not the sole purview of philosophers, lawyers and politicians: we must all do our part.

^{^[1]} http://d3js.org/

^{^[2]} http://martinfowler.com/bliki/Datensparsamkeit.html

Mike Mason

342

14 Jan, 2014

Requirements of Big Data in the Cloud Uprising

The inevitable Big Data may only emerge if it meets the qualitative criteria of Cloud. A requirement level has trouble reaching for some Cloud players. This opens the way for the consolidation of this market.

In recent years, the issue of Cloud occupies the headlines, leaving a little space for other IT trends. But recently, the Big Data has supplanted the cloud in discussions and mobilizes more professional attention. This is a positive effect that the capabilities and potential of Big Data are now better known: it could never exist without the Cloud, and because of him, the Cloud will never be the same. We, as service providers and users of Cloud technologies, must ask ourselves about the changes coming to the industry. Are we prepared to handle the flood of Big Data? Is that all Cloud providers will assume the new demands generated?

If Big Data appears complicated at first glance, a large percentage of the population is exposed to use every day without realizing necessarily account. For example, thanks to Big Data, the major search engines suggest words in their search bar even before the user has finished typing a word. How is this possible? This is actually a complex operation, but to put in simply, saying that search engines store large amounts of search terms, they sort and classify users to be able to suggest the most popular and relevant words.

The disruptive nature of Big Data

In the context of the Cloud, complex and disruptive nature of Big Data makes sense. The disruptive nature is linked to various factors that come into play to successfully exploit these huge volumes of data, but also for the additional stresses to which the cloud service providers face.

For Example:

1) All data must be stored in the same place: In fact, we need to analyze and process data in the same place, otherwise the data movement between different locations significantly prolong the analysis time. Cloud providers must therefore have at least one data center which can store all data. Is this is the case with all cloud providers?

2) System reliability account: In order to effectively analyze large amounts of data, cloud service providers must be able to provide a reliable and ultra-powerful network; otherwise it may well await the outcome of an analysis yet supposed to be instant. Is that all providers can offer a powerful network?

3) Strict compliance with service level agreements (SLA) under analysis, failure of virtual machine (VM) is enough to stop the operation and the client will need to run all his operations on another platform. In other words, with Big Data, the service level agreements cease to be a simple preference to become mandatory. Is that all cloud service providers are capable to meet this requirement?

4) Custom configuration, in each case: Since the stability, power and network capacity storage gaining importance with Big Data, levels of performance and quality of service must be configured for each client. Is that all suppliers can meet this requirement, and especially do they accept?

Meet all requirements

It is clear that the qualities of network service level agreements on service performance and availability, as well as API are critical aspects for the proper functioning of the analytical tools of Big Data. For this reason, any vendor of Cloud must satisfy each of these requirements in order to perform the analysis correctly and claim to provide services around the Big Data.

However, most cloud providers are not ready. Consequently, and in the context of future exponential growth of services around Big Data, there is likely to industry consolidation, which will tighten around a handful of vendors which continue their development, which others specialize in nice and solve IT problems secondary customers. Make no mistakes: the Big Data revolution has already begun, and while each DSI feels compelled to find new ways to stimulate economic growth, Big Data will continue to fuel conversations and become a factor of the consolidation in the Cloud market.

Paul Lopez

270

11 Oct, 2013

Data and Content: How to Unlock Customer Value through Investments in Smart Data

Gartner kicked off 2013 with a bold forecast: by 2017, CMOs will spend more on IT than their counterpart CIOs. At the heart of this projection is the widespread proliferation of Big Data; marketers are working with more robust (and, frustratingly, more disparate) data sets than ever before, so investing in tools to harness that information and more importantly, make it actionable, is of paramount importance to unlocking incremental business value.

Indeed, while Big Data is certainly exciting (what marketer will ever refuse more data?), the real opportunity for marketing disruption stems from Smart Data, the alchemical process that turns the haziness and inconsistency of diverse data sets into tangible business value.

Let’s talk about the implications of Smart Data for a publisher as an example.

The best marketers know that it’s typically more valuable to retain an existing subscriber than to acquire a new one. Excitingly, there are a myriad of ways to drive that retention. Key mechanics include converting more people who consume free content into paying subscribers and getting them to upgrade that much sooner. Additionally, driving ongoing reader engagement such as frequent visits/usage and mitigating the risk of disengagement or, even worse, subscriber churn are also critical. To optimize these mechanics, however, businesses need access to not just Big Data, but, rather, Smart Data.

Smart Data looks like a 360-degree profile (i.e. information from the entire business ecosystem - behavioral patterns, site experiences, feedback scores, social data, etc.) for each unique end user so that marketers can achieve situational understanding around key metrics — for instance, maybe a free reader is more likely to upgrade to a paid subscription if she likes sports or has the iPhone app. From there, actionable data can automatically deploy the right message — be it an email, a site prompt or a push notification — to address a particular opportunity or challenge to the right user at the right time. In the example of higher conversion from iPhone app users, the publisher could send targeted messages to download the iPhone app to those users who have browsed with an iPhone but who do not already have the app.

We like to call this “strategic personalization.” Yes, personalization will undoubtedly move the needle, but we’re well-aware that businesses are always protecting the bottom line and seeking out the tools to do so. Therefore, personalization must be somewhat strategic. In other words, why send a discount to a free subscriber in her first email if that user may have had a willingness to pay the full price? Think about the offline corollary - once you’ve seen yogurt on sale three times, are you likely to pay full price ever again? Predictive reporting can help companies understand the ideal time window for a paid subscription upgrade, and will typically only deploy incentives or discounts if the user is at risk for not transacting in that ideal threshold. This type of approach improves conversion while protecting willingness to pay as well as retention—those activated by discounts are almost always more likely to churn and, ultimately, margins and the bottom line.

Strategic personalization also allows for deeper reach into the editorial arsenal. With traditional editorial strategies, publishers can only promote a limited set of stories to their audiences through marketing messages like email. A truly 1:1 personalization approach, however, allows for content that, historically, may never have been “plugged” to get air time — and in front of the users most likely to consume it. Most importantly, this personalization is made in real time, ensuring no opportunity is lost.

It’s important for marketers to remember that customers are not segments, and, along these lines, the “batch and blast” approach to marketing is a dying breed. So it’s truly mission-critical for marketers to make investments in data and content that will fuel real relevance; Smart Data versus just Big Data. Similar realities exist for marketers of all shapes and sizes (there are particularly strong corollaries for retailers), but in all cases one thing is true: investments in data will help marketers better leverage their current assets.

Cassie Lancellotti-Young

1966

4 Oct, 2013

Why People Data Directs Big Data

Big Data cannot provide a competitive advantage until people across the enterprise connect and collaborate to transform data into profit boosting insight. The tale of Big Data’s rise is fraught with great expectations and poor ROI. But, there’s hope! We like to call it “people data.”

People matter. Every sales person, service rep, and operations manager holds a slice of the Big Data cake that should be sweetening your customer’s experience. Connecting them is the key. Big Data is filtered through people, and those people have to get other people to act on their findings to catalyze change.

If we can start hearing everyone’s perceptions on their slice of the Big Data problem, innovative solutions will emerge.

I. Big Data Stumbles…

Wikibon’s preliminary research shows that enterprises expect a three to four-fold return on every Big Data dollar. Reality check: respondents pegged the actual return at $.55 for each dollar invested. In June, Gartner’s Adoption survey confirmed that Big Data interest continues to accelerate. 64% of responding organizations reported Big Data projects (up 8% from 2012). Still, more often than not, investments yield abysmal results. Why?

Wikibon’s analysis identified 2 reasons Big Data projects underperform:

No Use Case: Often, IT departments are running experimental projects that aren’t tied to clear business outcomes. In fact, 56% of Gartner’s 720 respondents confessed that they don’t understand how to get value out of their data.
Staffing Shortages: Even when Big Data pilot projects succeed, enterprises found they lacked the data scientists, admins, and developers necessary for a large-scale addition of projects.

So, who’s successful with Big Data? Is there a model that turns useless data into profits?

II. People Collaborate to Innovate

Yes, enterprises that start with small use cases and clearly identify talent gaps before expansion are well positioned to capitalize on Big Data. For most businesses, the core challenge is to align leadership around the right first use case and accurately assess staffing needs. Solution: people data.

Imagine, if you will, that you’re coaching a high school debate team. Out of business experience, I know, but roll with me here. You’ve got to help ten sharply dressed hipster types tackle both sides of this issue: “The United States Federal Government should invade Syria.” How you gonna’ do it?

If you’re smart, you won’t start suffocating them with data. Interviews, op-eds, and special briefings from the New York Times (the debaters’ equivalent of Big Data) are important, but you’re wasting everyone’s time until you get a feel for what they already know.

Each kid’s ideas and perceptions about the topic are “people data.” When you get them to start sharing their thoughts, something magical happens: ideas collide, change shape, take on new forms, and start sparking brilliant arguments.

Generate use cases just like debaters spark arguments. Coach your Big Data practitioners to solicit feedback from as many bottom-line contributors as possible. The more varied the voices, the more confident you’ll be that you’ve found a successful project. Make sure you get the perspectives of sales, marketing, customer service, and delivery professionals. They’re constantly touching customers and often feel the pains Big Data solves.

Ask questions like:

Where are your biggest strategic blind spots?
Is there a reservoir of data we could tap to remove that blind spot?
What internal and external talent do we need to understand this data?

III. Connect Voices to Speed Data Flow

After you create a list of potential Big Data use cases, ask stakeholders to rank the options. You want to find out where leadership is aligned and why. Rank them in terms of:

ROI: how easy is it to monetize anticipated benefits?
Time-to-insight: How quickly will stakeholders experience results?
Repeatability: Will this use case serve as a predictable resource model for future projects?
Barriers: Does project X face any identifiable barriers to deployment?

One of our Fortune 500 clients walked through this exercise to evaluate over 50 possible innovation project investments. 100 senior executives identified which projects had the highest potential for return, faced the fewest resource barriers, and enjoyed almost universal buy-in. Executive respondents also recommended external and internal partners to expedite implementation.

Every recommendation our client received came not from Big Data itself, but from the minds of the stakeholders that were responsible for the investment. Because they collected insight from executives across the enterprise, they were able to start small and pick the project that had the greatest chance for success.

You can do the same. Steve Johnson, the author of Where Good Ideas Come From, points out that “chance favors the connected mind.” Everyone in your organization has hunches that can help you turn Big Data into real profits. Frequently, hunches lurk in the minds of many people across the enterprise and it’s not until you put them all together that you see an actionable Big Data roadmap.

Zachary Enos

326

8 Aug, 2013

Kaspersky Labs explains: How to Protect Your Business from Cyber Attacks Part II

In the first part of this article, we told you about targeted cyber attacks and how cyber criminals penetrate corporate networks, attacking the computers of employees who use their desktops for social networking and other cyber-skiving.

Along with targeted cyber attacks there are other threats. Intentionally or by chance, employees may be guilty of disclosing confidential data or breaking copyright laws, which might result in law suits against the company.

We will tell you about some incidents related to the storage and transfer of corporate documents via a personal mailbox or a cloud service and the use of software for P2P file sharing. We will explain what technologies and security policies allow system administrators and IT security specialists to prevent such incidents.

Reputation loss

Your company’s reputation is worth protecting - and not only from cyber criminals. Employees who send professional correspondence to their personal mailboxes, download illegal content, or use pirated software on corporate computers never think they might damage their company’s reputation.

Confidential information disclosure

One company faced an accident in which extremely confidential information was disclosed. Data security specialists started the investigation by checking the leaked documents and were surprised to learn that the metadata contained important information - the company’s name, computer’s name, where the document was stored for the last time, authors’ names, e-mail addresses, telephone numbers, and more. Criminals usually delete this data to hide the source of the leak. During the investigation, the experts found that the copies of disclosed documents were stored on the computers of five employees. None of them admitted to handing the documents over to a third party; moreover, having learnt about the accident at the interview with the security, all of them were genuinely surprised. After analyzing the corporate proxy-server logs, it was revealed that one of those five employees had uploaded copies of the disclosed files to a mail service.

At the second interview, this employee confessed that he had used his personal mailbox a few times to store corporate documents. It was convenient: if he had no time to finish or read a document, he sent it to his personal mail and finished it at home. Any employee could gain remote access to his corporate mailbox on request, but the employee hadn’t set up any extra protections. He didn’t anticipate any problems with using his personal mailbox for work.

Having gained access to his personal mailbox, data security specialists checked the list of IP addresses used to connect to the e-mail. Along with the employee’s home and corporate IP addresses, a lot of other addresses of proxy-servers from different countries surfaced.

While investigating the employee’s computer security, specialists discovered spyware that logged all the account data for different systems - sites, social networks, mailboxes, and online banking services. Having used the malware to gain access to the employee’s mailbox, the criminal found a lot of corporate documents stored there.

Though the guilty employee was fired; the reputational damage to the company lingers on.

Breach of copyright

It’s widely known that pirate content download is a violation of copyright law. However, few people remember that when you use the Internet from your corporate network, you use the IP address of your company. This means that if a violation is discovered, it is the company who will be liable.

A small company suffered an unpleasant incident. At certain times, there was a sharp drop in Internet connection speeds. Network traffic statistics showed one computer using 80% of the network capacity, with in-coming and out-going connections going off the scale. The sysadmin assumed that the computer was used to share files on a P2P network.

It turned out that one employee had brought his personal laptop and connected it to the corporate network. A BitTorrent client installed on the laptop was set to run automatically when the system started. The employee had forgotten all about it and the program running on his laptop caused trouble with the Internet connection.

Three months later, local law enforcement authorities came to the office with a search warrant and took many hard drives and documents, because they suspected that the company had used pirated software, in breach of copyright rules. In the end, the company was fined and, since then, stronger restrictions against pirate software have been introduced in the security policy. Now, employees face serious sanctions for a first offense, and lose their jobs if there is any repeat. In addition to those punishments, illegal content (hacked software, video, music, e-books, etc.) is forbidden whether it is downloaded to a corporate computer from the Internet, or if it is brought from home.

Solution

We described just two cases in which the violation of corporate policies by employees led to serious incidents. In everyday life, there are many more scenarios like this. Fortunately, there are also some simple methods, which, together with security policies, can help to prevent the majority of these incidents.

Network Traffic Control

In the incident described above - corporate documents leaked and unlicensed content loaded via P2P - the corporate network served as a channel to send and receive data. Firewall, IPS, HIPS, and other technologies allow system administrators and IT security specialists to limit or block:

Access to public services and their servers - mail services, cloud storages, sites with forbidden content, etc.
Use of ports and protocols for P2P sharing
Sending corporate data outside the corporate network

It’s worth remembering that no single control of network traffic can provide the highest level of corporate network security. In order to bypass security policies, employees can use traffic encryption methods, connect to the copies (mirrors) of blocked online services, or use proxy servers and anonymizers. Moreover, many applications can use other application ports and embed their traffic into various protocols, which cannot be forbidden. In spite of these obstacles, network traffic control is important and necessary, but it needs to be combined with application control and file encryption.

Application control

Using application control, a system administrator or data security specialist can not only forbid any unwanted software, but also track what applications employees use, as well as when and where they use them. It’s almost impossible to prohibit all pirated software, as a lot of varieties of an application may be created and they may be almost identical. So, the most effective approach is to use application control in default deny mode to ensure that all employees use only authorized software.

File encryption

It’s often impossible to track how employees use cloud services and personal mailboxes to store corporate data, which may include confidential information. Many mail services and cloud storages encrypt files transmitted by a user but cannot guarantee protection against intruders - a stolen login and password will give access to the data.

To prevent this type of theft, many online services attach cell phone numbers to their accounts. Along with the account data, a criminal will need to intercept a one-off confirmation code, sent to a mobile device during authorization. Note that this protection is safe only if the mobile device has no malware that will let the criminal see the code.

Fortunately, there is a safer way to provide security for corporate documents transmitted beyond the corporate network - file encryption technology. Even if intruders get access to a mailbox or cloud storage where an employee stores corporate papers, they won’t be unable to access the content of these documents, since they have been encrypted before their transmission to an external server.

Security policies

Network traffic control, application control, and data encryption are important security measures that can detect and automatically prevent data leaks as well as restrict the use of unwanted software on the corporate network. It’s still necessary, however, to implement security policies and increase employee awareness, since many users do not realize their actions may threaten their company.

In case of repeated violations, security policies should lead to administrative sanctions towards the offender, including dismissal.

Security policies should also stipulate the actions that should be taken if a former employee has access to confidential information or critical infrastructure systems.

Conclusion

Incidents like confidential data leaks or unlicensed content loaded from a corporate IP address may cause significant damage to a company’s reputation.

To prevent this damage, companies should limit or completely block employee access to online resources that may be a threat to a company, and also limit or block the use of those ports, data transmission protocols, and applications that are not required for work. File encryption technologies should be used in order to ensure the confidentiality and integrity of corporate documents.

IT security experts should keep in mind that, along with incident detection and prevention, they should pay attention to administrative protection measures. Users should be aware of what is allowed and prohibited by a security policy and the consequences of any violation.

Kirill Kruglov

646

6 Aug, 2013

Top Tech & Startup News - 7 Things You Missed Today the

Tech and Startup News for August 7th, 2013:

1. Zynga shutting down OMGPOP

It’s been barely two years since Zynga purchased OMGPOP for $200 million, but now, Zynga has confirmed plans to shut down the game developer. Although some OMGPOP team members had attempted to buy back the OMGPOP.com site, games, and intellectual property, Zynga refused to sell anything from the company. OMGPOP games like Cupcake Corner, Snoops, and Gem Rush will all shut down on August 29th. The website for the company will also go dark at the end of September.

2. Hacktivist Richard Stallman advocates for ’truly free software’

During a recent lecture, held at NYU, the controversial hacker Richard Stallman warned that proprietary and open-sourced software is not as free as it claims to be. In order for software to really be free, Stallman claimed, it must include:
-The freedom to run the program in question, for any purpose
-The freedom to study how that program works, and change it so it does your computing as you wish - in other words, the freedom to access its source code
-The freedom to redistribute copies so you can help your neighbor
-And, lastly, the freedom to distribute copies of your modified versions to others for the same reason

3. The Department of Commerce might be reviving a part of SOPA

The Stop Online Piracy Act died last year in Congress. However, a piece of its legislation might be returning from the dead. The Department of Commerce’s Internet Task Force recently endorsed SOPA’s proposal to make the streaming of copyrighted works a felony. Although the streaming of copyrighted works is currently against the law, the offense is only a misdemeanor. If the proposal becomes a law, someone illegally streaming copyrighted works will be punished as severely as someone who illegally reproduced and distributed copyrighted works to the public.

4. Amazon launches artwork marketplace

You could soon purchase a work by Claude Monet without ever having to change from your pajamas. Amazon has recently announced that, with Amazon Art, the web-retailer has created an online gallery, which would allow people to buy artworks from prestigious collections around the country, while still at home. Among other galleries, the site currently promises access to collections from the Paddle8, Holden Luntz, and the McLoughlin galleries.

5. Google will update its searches for more in-depth in articles

Google revealed that the company is adding a new feature for its search function, which highlights “in-depth” articles associated with your search requests. Google has yet to provide many details about the company’s definition of “in-depth.” However, Google officials have claimed that search results are “ranked algorithmically based on many signals that look for high-quality, in-depth content.” Currently, Google users will only be able to use this feature if they use google.com in English.

6. Mozilla releases a new version of FireFox

FireFox 23 is here. This newest update to the browser features a number of changes, including but not limited to a mixed content blocker and a network monitor on the desktop side. If you squint at the new FireFox logo for long enough, you may notice that it looks a little different too. But the biggest change is the addition of a share button, which would allow users to share content with friends with just one click. With this new feature, users will be able to share content directly from Firefox wherever they are online. Firefox 23 has officially been released for Windows, Mac, Linux, and Android.

7. Discussions of Anonymity

With the current NSA scandal and the advent of Google Glass, there is a lot of discussion nowadays about the importance of maintaining anonymity in a democratic society. The unnamed author of the book Tremble the Devil once wrote a blog post titled, “The Importance of Being Anonymous.” Though the piece may have some problems, it does bring up an interesting point: the ability to express your opinion anonymously is often the ability to express yourself safely. To that extent, the threat of exposure could limit your freedom of speech. In our democratic system, the ballot is secret so that you can have a say in what the government does without fear of coercion or retribution from others.

Without that anonymity, people may be pressured out of saying what they think and may, instead, conform to the most widely accepted opinions out of fear. The Internet is a place where a wide variety of viewpoints can be shared—where everyone gets a voice. However, the Internet is also a place of exposure and social pressure. At the moment, we’re at a delicate balance. We have to decide what the Internet is going to be. Is it going to be a place where people become more homogenous in their beliefs?
Or could it possibly be something different?

Meredith Bradfield

530

29 Jul, 2013

When Your Data is Destroyed: Video

The startup Scality explains how to protect your data.

http://www.youtube.com/watch?v=r0Q8-xsj_WA

Beth Kindig

356

19 Jun, 2013

Are You Getting the Most Out of Virtualization?

Virtualization has been with us since the mainframe days, when a single system could be sliced into separate virtual machines, each allowed to run their own instances of applications, or even different operating systems.

But when one looks at how virtualization is being deployed today , are businesses really getting the bang for the buck that the technology promised? Probably not, but it is not their fault. The typical issue that companies face is that in virtualizing their servers, they address only two of the three major subsystems (compute and storage), leaving I/O mired back in the past.

The push for virtualization in x86 servers came about because of cost, management, floor space and power concerns. By implementing virtualization, a business can reduce its acquisition and operation costs. The business can also increase its velocity in order to take advantage of trends more adeptly.

With virtualization, one partitions a single server, carving compute (CPU and memory) into separate virtual machines. This allows multiple virtual servers to share the same physical host with tools like VMware or Microsoft’s Hyper-V. Storage has been virtualized for years via external shared storage on Storage Area Networks (SANs) or Network Attached Storage (NAS.)

But that final frontier, the I/O, remains tied to the physical server chassis. Worst of all, I/O tends to be the virtualization bottleneck. And it is the biggest cause of management headaches as administrators spend too much time provisioning, deploying, and managing the I/O resources.

Addressing the compute and storage of a server but ignoring the I/O is like putting a new engine in a car and then trying to race it with old, balding tires. A server needs balance in its subsystems to keep bottlenecks from occurring .

I/O virtualization is less common in the market, but it is rapidly proving to be an important element as more companies are finding that I/O has become the bottleneck for their servers. I/O virtualization removes the standard I/O controllers (network and storage) and places them at the top of the rack in an intelligent appliance. This appliance allows all of the servers to share and pool their devices. When businesses move the I/O to the top of the rack and use the I/O to tie their servers to the end of row switching, they can get rid of expensive Ethernet and Fibre Channel switches.

By virtualizing the I/O in a top-rack pool, administrators can quickly provision and deploy resources to servers from a remote console without ever having to touch the rack. Businesses are no longer held back; they can take advantage of changes in the market rapidly, using new products and services faster than ever before as IT becomes a catalyst for change instead of a roadblock of it.

Utilizing I/O virtualization has another huge benefit for businesses: smaller servers. But not less powerful servers, just smaller servers. Without having to host all of those I/O devices inside each server, a company can use smaller form factor servers. We find that most customers who used the NextIO vNET I/O Maestro also used 2U and 4U servers in the past and now deploy with 1U servers. Smaller servers consume less energy than their larger counterparts and require less AC cooling. Because they consume less space in the rack, they also help businesses consolidate their IT infrastructure. Best of all, smaller servers are almost always lower in cost, allowing a company to either reign in their acquisition costs or buy more compute power for the same amount of budget.

Through I/O virtualization, a single cable (or pair of cables if running redundant connections) will be all that is needed to communicate between the server and the I/O virtualization appliance at the top of the rack. By reducing the cabling (up to 80% in some cases), a company can reduce the cost, complexity, and management of all of those cables. And with fewer cables behind the servers, there is better airflow, which will also lower cooling costs.

Through I/O virtualization a company can bring a balanced solution to their virtualized data center that:

Removes the bottlenecks typically associated with virtualization
Reduces the cost to deploy and manage servers
Gives the business better agility, allowing them to react more quickly to changes in the business environment
Minimizes cabling, allowing for lower cost, better airflow and easier provisioning – all without having to touch the rack

Clearly, with compute and storage virtualization rolling out across production servers, businesses should be looking to I/O virtualization to provide the final missing piece of a more efficient data center.

John Fruehe

263

11 Jun, 2013

Big Data for Small Business by Stephen E. Arnold

Big Data is one of the buzzwords which attract attention.

In London on May 15 and then in New York City on May 22, 2013, I had the tough job of explaining the impact of Big Data on business software.

The graphic I used to explain what my firm’s research documented evoked considerable comment. You judge for yourself.

Unlike software which computes payroll or a content management system used for blog or Web site content, big data is not tidy. One key point I made in my lectures was that there is little agreement about what Big Data means, how “big” Big Data are to qualify as big data, and what technology is required to make sense of Big Data.

Large companies like IBM have invested hundreds of millions in Big Data. IBM offers the Cognos range of analysis tools. IBM also uses its own proprietary technology and weaves together high-profile specialist technologies from SPSS (the statistics system) and Vivisimo (a federated search specialist), and home grown technology.

But what about the small business or mid-sized company which wants to tap into the insights from information flowing through Twitter, across Facebook pages, or buried within LinkedIn. Without millions to invest, what can the average business person do with Big Data?

The answer? Excel, which is the most widely used data analysis program.

If you have geographic data about your customers, you will want to take a look at GeoFlow. I explored the preview version for Excel 2013. After downloading and installing Office 365 Pro Plus (http://office.microsoft.com/en-us/redir/FX103213513.aspx), I then downloaded and installed the Preview for GeoFlow (http://office.microsoft.com/en-us/redir/XT104048049.aspx). You will need Windows 7 or 8 and have Microsoft’s Dot Net Version 4.x framework installed as well. To give the system a test drive, Microsoft provides a getting started document (http://office.microsoft.com/en-us/redir/XT104046472.aspx). Microsoft also provides several data sets which you can use to exercise the system. If you want more data, Microsoft provides a number of other data sets without charge via its open government data initiative (http://www.microsoft.com/government/en-za/initiatives/Pages/open-government-data-initiative.aspx). As I write this, this amped up Excel add in can give anyone with basic spreadsheet skills a way to make sense of mailing lists, sales records with geocodes, or public data.

Sample GeoFlow output. Source: Microsoft Corp. at http://blogs.office.com/b/microsoft-excel/archive/2013/04/11/public-preview-of-geoflow-for-excel-delivers-3d-data-visualization-and-storytelling.aspx

If your business belongs to a professional association, you may have access to information about other members. Maybe you want to sell to organizations engaged in agriculture. You can navigate to Data.gov (http://www.data.gov/metric) and download one or more of the 249 free datasets on the US government Web site. But once you have hundreds of megabytes of numbers, what can you do? Current versions of Excel include an Analysis ToolPack. You can install the components using the Microsoft Office Button and clicking Excel Options. Follow the prompts to the list of Add Ins available.” Once the ToolPack is available, you have access to sophisticated analytic functions, including Fourier Analysis, Regression, and various statistical tests. Microsoft offers four to 10 minute videos which contain step-by-step instructions for performing common analytic tasks. “Correlation Using Excel’s Data Analysis Toolpak, for instance, is excellent. (http://www.youtube.com/watch?v=VjdyqNz90hc). Educational institutions like the University of Massachusetts provides helpful how-tos to explain some of the tricks for tapping into the power of Excel’s analytic functions. See, for example, “Using Excel for Statistical Data Analysis” ( http://people.umass.edu/evagold/excel.html).

If you have a collection of text such as emails or customer help desk files, you can use commercial semantic analysis tools to reveal insights in unstructured content. Semantria, a semantic technology developer, offers an Excel based text analytics service. The company uses cloud based technology to process the text. The results are then published to the user’s Excel spreadsheet. Semantria is a pay-as-you-go service, but the company offers a free demonstration of its technology. To learn more about navigate to https://semantria.com/. You will need some basic familiarity with installing add ins and using the Semantria application programming interface. Once you have the system up and running you can get a visual report about the sentiment of one or more documents and see the text which may alert you to a potential trouble spot or pinpoint a comment which can be used as a testimonial.

Semantria output showing named entities in a document and the major themes in text content. Source:www.semantria.com/demo.

If you want to explore Big Data, you can take advantage of these and other low cost, easily accessible systems.

Stephen E Arnold, May 25, 2013

Register for the giveaway here: http://citizentekk.wufoo.com/forms/citizentekk-giveaway-form/

1) Become a fan of our CitizenTekk Facebook Page:

Link

2) Tweet us:

Tweet #tekkgiveaway

3) Comment:

Comment on this post by answering the following question:

What would you do with this product?

4) Tell your friends!

The more friends you tell, the better your chances of winning
Stephen E Arnold is one of the world’s leading authorities in online information systems. You can sign up for his free monthly newsletter by writing [email protected]. Check out his blogs at www.xenky.com.

Stephen E. Arnold

265

Category: Big Data

There are 11 posts published under Big Data.