Before going into your answer, I would like to elaborate that what machine learning actually is and which machine learning algorithm Google uses.
What is machine learning?
Machine learning is where a computer teaches itself how to do something, rather than being taught by humans or following detailed programming.
What is the name of Google’s search algorithm?
It’s called Hummingbird, as we reported in the past. For years, the overall algorithm didn’t have a formal name. But in the middle of 2013, Google overhauled that algorithm and gave it a name, Hummingbird.
Which machine learning algorithm does Google use?
RankBrain is Google’s name for a machine-learning artificial intelligence system that’s used to help process its search results, as was reported by Bloomberg and also confirmed by Google. It is part of Google’s overall search “algorithm,” a computer program that’s used to sort through the billions of pages it knows about and find the ones deemed most relevant for particular queries.
Further proof of RankBrain’s efficacy lies in its ability to synthesise queries with a higher rate of accuracy than its human predecessors. Google revealed that in comparative testing of query ranking, Google search engineers were correct 70 percent of the time, while RankBrain enjoyed a success rate of 80 percent.
RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.
How RankBrain helps in refining queries?
RankBrain is designed to help better interpret ‘long-tail’ queries and effectively translate them, behind the scenes in a way, to find the best pages for the searcher. It can see patterns between seemingly unconnected complex searches to understand how they’re actually similar to each other. This learning, in turn, allows it to better understand future complex searches and whether they’re related to particular topics. Most important, it can then associate these groups of searches with results that it thinks searchers will like the most.
Google didn’t provide examples of groups of searches or give details on how RankBrain guesses at what are the best pages. But the latter is probably because if it can translate an ambiguous search into something more specific, it can then bring back better answers.
How about an example of RankBrain?
Google gave one fresh example: “How many tablespoons in a cup?” Google said that RankBrain favoured different results in Australia versus the United States for that query because the measurements in each country are different, despite the similar names.
Which technology stands behind RankBrain?
Google still saves plenty of goodies for its own programmers. Internally, the company has a probably unparalleled tool chest of ML prosthetics, not the least of which is an innovation it has been using for years but announced only recently — the Tensor Processing Unit. This is a microprocessor chip optimised for the specific quirks of running machine language programs, similar to the way as Graphics Processing Units are designed with the single purpose of speeding the calculations that throw pixels on a display screen. Many thousands (only God and Larry Page probably know how many) are inside servers in the company’s huge data centres. By super-powering its neural net operations, TPUs give Google a tremendous advantage. As part of programming languages, C/C++, Python, MATLAB and Haskell were mainly used. “We could not have done RankBrain without it,” says Dean.
You might be also interested in below learning . It was posted originally on : https://www.zdnet.com
The best programming language for data science and machine learning
With software development being redefined to work in a data science and machine learning context, this timeless question is gaining new relevance. Let’s look at some options and their pros and cons, with commentary from domain experts.
Even though, in the end, the choice is at least to some extent a subjective one, some criteria come to mind. Ease of use and syntax may be subjective, but things such as community support, available libraries, speed, and type safety are not. There are a few nuances here, though.
Execution speed and type safety
In machine learning applications, the training and operational (or inference) phases for algorithms are distinct. So, one approach taken by some people is to use one language for the training phase and then another one for the operational phase.
The reasoning here is to work during development with the language that is more familiar or easy to use, or has the best environment and library support. Then the trained algorithm is ported to run on the environment preferred by the organization for its operations.
While this is an option, especially using standards such as PMML, it may increase operational complexity. In addition, in many cases things are not clear-cut, as programming done in one language may call libraries in another one, thus diluting the argument on execution speed.
Another thing to note is type safety. Type safety in programming languages is a little like schema in databases: While not having it increases flexibility, it also increases the chances of errors.
In this thread initiated by Andriy Burkov, machine learning team leader at Gartner, Burkov argues against using dynamically typed languages such as Python for machine learning.
“You can run an experiment for several hours, or even days, just to find out that the code crashed because of an incorrect type conversion or a wrong number of attributes in a method call,” says Burkov.
Despite having what is arguably the largest footprint in enterprise deployment, Java is not getting much love these days. Some of this may have to do with the “coolness factor,” as Java has been challenged by new programming languages, but there are also some very real concerns here.
What has greatly helped Java establish it footprint, namely the JVM, is also a reason why people are skeptical about using it for machine learning. Similarly, one famous feature of Java, which helps deal with the complexities of C++, garbage collection, may pose problems in production environments.
When discussing trends in software development with Paco Nathan, managing partner at Derwen and data science practitioner and thought leader, the topic did come up.
Nathan notes that the trend he sees is toward real-time applications, and this is not something he believes the JVM is well-suited for, as it is an abstraction over the hardware. Adding a layer between the code and the hardware provides cross-platform portability, but also slows down execution.
Nathan also cites Ion Stoica, the initiator of Apache Spark, which is heavily used for real-time applications. Nathan mentioned that one of the rules Stoica has recently set for his research team in Berkeley is abolishing Java.
Nathan commented that he expects that to spill over from research to industry over a five-year timeframe, as is typical for directions initiated in research environments. But maybe we should not be too fast in writing off Java.
The ups and downs that have been following Java during its stewardship by Oracle may have contributed to its falling out of grace. They may also have something to do with the perceived stalemate in the evolution of the JVM.
With enterprise Java being handed off to the Eclipse foundation, however, there is a chance Java and the JVM may be revitalized. There are also initiatives, such as Gandiva, which aim to optimize Java code for specialized hardware, potentially making it a competitive option for machine learning.
In addition, that large footprint has given rise to initiatives, such as DeepLearning4J, which aim to bring to Java users access to the same libraries typically used through other languages.
According to a recent survey by KDNuggets, Python is the undisputed leader in use for data science and machine learning. Some often cited reasons for this preference are the wide choice in libraries and the fact that it’s considered an easy language to work with.
Ashok Reddy, GM DevOps at CA Technologies, notes that Python was the language of choice in his recently completed master’s in AI and Machine Learning at Georgia Tech.
Reddy goes on to add that Python is gaining popularity in universities due to its simplicity, so graduates are more likely to know Python than Java. Beyond simplicity, he also cites the abundance of libraries as a key reason for this.
Reddy notes that, from a performance perspective, C is also a popular choice for use in AI and embedded-IoT applications, but Java is not going away. Reddy also sees a pattern in using Python for development and then other languages for deployment of machine learning algorithms.
This also applies internally at CA, as Reddy notes that, in addition to having legacy code in C and Java, the cross-platform portability that Java offers is a key priority for CA.
“Many startups use Ruby or Python initially, and when they grow up they switch to Java,” says Reddy.
In the KDNuggets survey, R’s share seems to be dropping compared to last. R, however, has been gaining enterprise adoption over the last few years.
Read also: Which programming languages pay best?
In some ways R is not a typical programming language, as it’s not a general purpose one. R’s roots lies in statistics, as it has been developed specifically to deal with such needs.
That, and the fact that it’s open source, make for a wealth of off-the-shelf libraries for common and not-so-common related tasks. The flip side of this is that R has been plagued by issues such as memory management and security, and its syntax is not very straightforward or disciplined.
In the past few years, R has seen development environments been built around it in order to fill the gaps required to take it out of the data science lab and into enterprise deployments.
One of those, created by Revolution Analytics, has been integrated in Microsoft’s offering(Visual Studio, SQL Server, Power BI and Azure) following its acquisition by Microsoft. Another one, R Studio, has been integrated initially with Apache Spark and now with Databricks.
The way this was done is indicative of another strength of R — its package system. It is through this, and its ties with the academic community, that R keeps up to date with all latest developments in data science and machine learning.
While R may be a good choice for development, its value in production is highly dependent on its supporting ecosystem.
Julia, Golang, Rust, Swift, and JVM languages
And what about those who do not want the dynamic typing of Python, or the lecagy baggage of Java or C / C++? Well, apart from the fact that Python 3.6 and later supports static typing.
Burkov notes that Scala and Kotlin, two newer languages based on the JVM, have optional typing, but a steep learning curve and low user adoption, respectively. And, in the end, we might add, they also come with the same restrictions imposed by the JVM.
Swift, notes Burkov, has static typing and low availability of machine learning libraries/data analysis. Other options suggested by contributors in the same thread are Golang, Julia, and Rust.
Golang has been pointed out as being fast, thread ready, easy, clean, compiled, and simple. And it has increasing support for libraries for NLP, general machine learning, and data analysis, extraction, processing and visualization.
Julia has been pointed out as being flexible with type usage and JIT complied similar to Java, but having execution speed comparable to C. It’s a relatively new language, so its community is not the biggest around. However, Julia does have some support for machine learning libraries.
Read also: These five programming languages have flaws
Rust has been pointed out as compiling natively and efficiently like plain C/C++, lacking garbage collection, and being type safe and rich. Admittedly, even by its proponents, though, it is not really ready for ML due to lack of ML specific libraries.
The choice of programming language is not a simple one, and in the end it may not even be the most important one either. As pointed out by Luiz Eduardo Le Masson, data science leader at Stone Co.:
“For ‘ordinary machine learning,’ it does not matter what language you use. But when you need to have real online learning algorithms and inferences in realtime for millions of simultaneous clusters and respond in less than 500 ms, the topic does not only involve languages, but architecture, design, flow control, fault tolerance, resilience.”