Home > agile-data-science > Extracting features with PySpark

Extracting features with PySpark

<<
Agile Data Science - Role of Predictions

Extracting features with PySpark

In this chapter, we will learn about the application of the extracting features with PySpark in Agile Data Science.

Overview of Spark

Apache Spark can be defined as a fast real-time processing framework. It does computations to analyze data in real time. Apache Spark is introduced as stream processing system in real-time and can also take care of batch processing. Apache Spark supports interactive queries and iterative algorithms.

Spark is written in “Scala programming language”.

PySpark can be considered as a combination of Python with Spark. PySpark offers PySpark shell, which links Python API to the Spark core and initializes the Spark context. Most of the data scientists use PySpark for tracking features as discussed in the previous chapter.

In this example, we will focus on the transformations to build a dataset called counts and save it to a particular file.

text_file = sc.textFile("hdfs://...")
counts = text_file.flatMap(lambda line: line.split(" ")) \
   .map(lambda word: (word, 1)) \
   .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs://...")

Using PySpark, a user can work with RDDs in python programming language. The inbuilt library, which covers the basics of Data Driven documents and components, helps in this.

Frequently Asked Questions

Agile Data Science - Role of Predictions

Ans: Agile Data Science - Role of Predictions view more..

Agile Data Science - Working with Reports

Ans: Agile Data Science - Working with Reports view more..

Agile Data Science - Data Enrichment

Ans: Agile Data Science - Data Enrichment view more..

Extracting features with PySpark

Ans: Extracting features with PySpark view more..

Building a Regression Model

Ans: Building a Regression Model view more..

Deploying a predictive system

Ans: Deploying a predictive system view more..

Agile Data Science - SparkML

Ans: Agile Data Science - SparkML view more..

Fixing Prediction Problem

Ans: Fixing Prediction Problem view more..

Improving Prediction Performance

Ans: Improving Prediction Performance view more..

Creating better scene with agile and data science

Ans: Creating better scene with agile and data science view more..

Agile Data Science - Implementation of Agile

Ans: Agile Data Science - Implementation of Agile view more..

Next Article >>
Building a Regression Model

Thank you

you will get you notes soon

Having trouble in finding the notes for your syllabus?

Do you want to earn some cash? CLICK ME...

You have a question? thenWe will be Happy to Help You

Extracting features with PySpark

Extracting features with PySpark

Overview of Spark

Frequently Asked Questions

Recommended Posts:

You have a question? then
We will be Happy to Help You