Lab_1 (Con.)
You can also try other aggregates:
|Aggregate||Description|
|---|---|---|
|\$sum||Calculate the total|
|\$avg||Find the average|
|\$min||Find the minimum value|
|\$max||Find the maximum value|
|\$push||Add the value to a list, duplicates are allowed|
|\$addToSet||Add the value to a set, duplicates will not be added|
|\$first||Find the first match only|
|\$last||Find the last match only|
## Exporting to JSON file
Since the documents are represented as a list of the dictionaries in Python, we can convert them to JSON format and save the result to a data file. The following is the procedure to do so.
First, we retrieve all documents from the collection.
```python
data = list(companies.find())
```
We use the `dumps` function provided by JSON package to convert the list of the documents to a JSON-String.
```python
import json
json_str = json.dumps(data)
```
Then, we open a file with the `write` mode and write the string to the file.
```python
fout = open('companies.json', 'w')
fout.write(json_str)
fout.close()
```
## Importing JSON file
In the previous part, we learnt how to load the data from a text file in Tab Delimited Text format. Now, we are going to learn how to load the data from a JSON file.
Before loading the data into MongoDB, we ensure that collection `companies` is empty. Therefore, we can the `delete_many` function to delete all documents.
```python
companies.delete_many({})
data = list(companies.find())
print(data)
```
[]
First, we open the data file from using the open function with the `read-only` mode ('r'), and load the line from the data file. Assume that the content contains a single line only. Hint, you may use readline() to read this line.
### Task 1: Open ("companies.json") file in correct mode
```python
file = open('companies.json', 'r')
```
```python
json_str = file.readline()
```
We then use the loads function provided by JSON package to convert the data from JSON-string format to a list.
```python
print(type(json_str), len(json_str))
```
<class 'str'> 16247321
```python
import json
data = json.loads(json_str)
```
Now, we can call the insert_many function to insert documents.
### Task 2: insert your data to companies
```python
companies.insert_many(data)
```
<pymongo.results.InsertManyResult at 0x7faf3d352b50>
Let's check whether the collection contains the new documents or not.
### Task 3: use find() method to check the contained data
please display the first five results in your results
```python
list(companies.find().limit(5))
```
[{'_id': 1,
'Year': '2021',
'Industry_aggregation_NZSIOC': 'Level 1',
'Industry_code_NZSIOC': '99999',
'Industry_name_NZSIOC': 'All industries',
'Units': 'Dollars (millions)',
'Variable_code': 'H01',
'Variable_name': 'Total income',
'Variable_category': 'Financial performance',
'Value': 757504,
'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'},
{'_id': 2,
'Year': '2021',
'Industry_aggregation_NZSIOC': 'Level 1',
'Industry_code_NZSIOC': '99999',
'Industry_name_NZSIOC': 'All industries',
'Units': 'Dollars (millions)',
'Variable_code': 'H04',
'Variable_name': '"Sales, government funding, grants and subsidies"',
'Variable_category': 'Financial performance',
'Value': 674890,
'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'},
{'_id': 3,
'Year': '2021',
'Industry_aggregation_NZSIOC': 'Level 1',
'Industry_code_NZSIOC': '99999',
'Industry_name_NZSIOC': 'All industries',
'Units': 'Dollars (millions)',
'Variable_code': 'H05',
'Variable_name': '"Interest, dividends and donations"',
'Variable_category': 'Financial performance',
'Value': 49593,
'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'},
{'_id': 4,
'Year': '2021',
'Industry_aggregation_NZSIOC': 'Level 1',
'Industry_code_NZSIOC': '99999',
'Industry_name_NZSIOC': 'All industries',
'Units': 'Dollars (millions)',
'Variable_code': 'H07',
'Variable_name': 'Non-operating income',
'Variable_category': 'Financial performance',
'Value': 33020,
'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'},
{'_id': 5,
'Year': '2021',
'Industry_aggregation_NZSIOC': 'Level 1',
'Industry_code_NZSIOC': '99999',
'Industry_name_NZSIOC': 'All industries',
'Units': 'Dollars (millions)',
'Variable_code': 'H08',
'Variable_name': 'Total expenditure',
'Variable_category': 'Financial performance',
'Value': 654404,
'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'}]