This tutorial will explain how you can preview, display or print 'n' rows on the console from the Spark dataframe. Below listed dataframe functions will be explained with examples, click on function name in the below list and it will take you to the respective section of the function:
df=spark.read.parquet("file:///path_to_file/orders.parquet")
df.show(truncate=False)
+--------+---------------------+-----------------+---------------+
|order_id|order_date |order_customer_id|order_status |
+--------+---------------------+-----------------+---------------+
|68817 |2014-03-27 00:00:00.0|6704 |COMPLETE |
|68818 |2014-03-31 00:00:00.0|12393 |PROCESSING |
.
.
.
|68835 |2014-05-02 00:00:00.0|764 |COMPLETE |
|68836 |2014-05-03 00:00:00.0|8009 |PENDING_PAYMENT|
+--------+---------------------+-----------------+---------------+
only showing top 20 rows
dataframe.show(n=20, truncate=True, vertical=False)
df.show()
+--------+--------------------+-----------------+---------------+
|order_id| order_date|order_customer_id| order_status|
+--------+--------------------+-----------------+---------------+
| 68817|2014-03-27 00:00:...| 6704| COMPLETE|
| 68818|2014-03-31 00:00:...| 12393| PROCESSING|
.
.
.
| 68835|2014-05-02 00:00:...| 764| COMPLETE|
| 68836|2014-05-03 00:00:...| 8009|PENDING_PAYMENT|
+--------+--------------------+-----------------+---------------+
only showing top 20 rows
df.show(2)
+--------+--------------------+-----------------+------------+
|order_id| order_date|order_customer_id|order_status|
+--------+--------------------+-----------------+------------+
| 68817|2014-03-27 00:00:...| 6704| COMPLETE|
| 68818|2014-03-31 00:00:...| 12393| PROCESSING|
+--------+--------------------+-----------------+------------+
only showing top 2 rows
df.select("order_id","order_status").show(2)
+--------+---------------+
|order_id| order_status|
+--------+---------------+
| 68817| COMPLETE|
| 68818| PROCESSING|
+--------+---------------+
only showing top 2 rows
df.show(4,False)
+--------+---------------------+-----------------+------------+
|order_id|order_date |order_customer_id|order_status|
+--------+---------------------+-----------------+------------+
|68817 |2014-03-27 00:00:00.0|6704 |COMPLETE |
|68818 |2014-03-31 00:00:00.0|12393 |PROCESSING |
|68819 |2014-04-03 00:00:00.0|1212 |COMPLETE |
|68820 |2014-04-04 00:00:00.0|6358 |COMPLETE |
+--------+---------------------+-----------------+------------+
only showing top 4 rows
df.show(truncate=False)
+--------+---------------------+-----------------+---------------+
|order_id|order_date |order_customer_id|order_status |
+--------+---------------------+-----------------+---------------+
|68817 |2014-03-27 00:00:00.0|6704 |COMPLETE |
|68818 |2014-03-31 00:00:00.0|12393 |PROCESSING |
.
.
.
|68835 |2014-05-02 00:00:00.0|764 |COMPLETE |
|68836 |2014-05-03 00:00:00.0|8009 |PENDING_PAYMENT|
+--------+---------------------+-----------------+---------------+
only showing top 20 rows
df.limit(1).show()
+--------+--------------------+-----------------+------------+
|order_id| order_date|order_customer_id|order_status|
+--------+--------------------+-----------------+------------+
| 68817|2014-03-27 00:00:...| 6704| COMPLETE|
+--------+--------------------+-----------------+------------+
df.show(df.count())
+--------+--------------------+-----------------+---------------+
|order_id| order_date|order_customer_id| order_status|
+--------+--------------------+-----------------+---------------+
| 68817|2014-03-27 00:00:...| 6704| COMPLETE|
| 68818|2014-03-31 00:00:...| 12393| PROCESSING|
.
.
.
| 68799|2014-02-14 00:00:...| 11190| COMPLETE|
| 68800|2014-02-17 00:00:...| 10037| PROCESSING|
+--------+--------------------+-----------------+---------------+
df.show(2,vertical=True)
-RECORD 0---------------------------------
order_id | 68817
order_date | 2014-03-27 00:00:...
order_customer_id | 6704
order_status | COMPLETE
-RECORD 1---------------------------------
order_id | 68818
order_date | 2014-03-31 00:00:...
order_customer_id | 12393
order_status | PROCESSING
only showing top 2 rows
df.show(2, truncate=False, vertical=True)
-RECORD 0----------------------------------
order_id | 68817
order_date | 2014-03-27 00:00:00.0
order_customer_id | 6704
order_status | COMPLETE
-RECORD 1----------------------------------
order_id | 68818
order_date | 2014-03-31 00:00:00.0
order_customer_id | 12393
order_status | PROCESSING
only showing top 2 rows